Methods for using artificial polynucleotides and compositions thereof to reduce transgene silencing

ABSTRACT

The materials and methods disclosed provide for polynucleotide molecules sufficiently divergent from polynucleotides naturally contained in plants, or polynucleotides previously introduced into plants as transgenes to permit trait stacking in plant breeding methods or plant transformation methods. The disclosure also provides for methods and compositions to detect the polynucleotides of the invention in plants.

This Application is a §371 U.S. national phase application ofInternational Application No. PCT/US03/021551, filed Jul. 10, 2003, andclaims the benefit of priority to U.S. Provisional Application No.60/396,665, filed Jul. 18, 2002.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to plant genetic engineering. Moreparticularly, to a method for constructing an artificial polynucleotideand methods of use to reduce transgene silencing in plants. Theinvention also relates to the plant cells containing the artificialpolynucleotide in which a plant cell is transformed to express theartificial polynucleotide and the plant regenerated therefrom.

2. Description of the Related Art

Heterologous genes may be isolated from a source other than the plantinto which it will be transformed or they may be modified or designed tohave different or improved qualities. Particularly desirable traits orqualities of interest for plant genetic engineering would include butare not limited to resistance to insects, fungal diseases, and otherpests and disease-causing agents, tolerances to herbicides, enhancedstability or shelf-life, yield, environmental stress tolerances, andnutritional enhancements.

Traditional molecular biological methods for generating novel genes andproteins generally involved random or directed mutagenesis. An exampleof random mutagenesis is a recombination technique known as “DNAshuffling” as disclosed in U.S. Pat. Nos. 5,605,793; 5,811,238;5,830,721; 5,837,458 and International Applications WO 98/31837, WO99/65927, the entirety of all of which is incorporated herein byreference. An alternative method of molecular evolution involves astaggered extension process (StEP) for in vitro mutagenesis andrecombination of nucleic acid molecule sequences, as disclosed in U.S.Pat. No. 5,965,408, incorporated herein by reference. An example ofdirected mutagenesis is the introduction of a point mutation at aspecific site in a polypeptide.

An alternative approach, useful when the heterologous gene is from anon-plant source, is to design an artificial insecticidal gene that usesthe most often used codon in maize plant codon usage table (Koziel etal., 1993, Biotechnology 11, 194-200). Fischhoff and Perlak (U.S. Pat.No. 5,500,365, incorporated herein by reference) report higherexpression of Bacillus thuringiensis (Bt) insecticidal protein comparedin crop plants when the polynucleotide sequence was modified to reducethe occurrence of destabilizing sequences. It was necessary to modifythe wild type Bt polynucleotide sequence because the wild type fulllength Bt polynucleotide did not express sufficient levels ofinsecticidal protein in plants to be agronomically useful.

Heterologous genes are cloned into vectors suitable for planttransformation. Transformation and regeneration techniques useful toincorporate heterologous genes into a plant's genome are well known inthe art. The gene can then be expressed in the plant cell to exhibit theadded characteristic or trait. However, heterologous genes that normallyexpress well as transgenes may experience gene silencing when more thanone copy of the same genes are expressed in the same plant. This mayoccur when a first heterologous gene is too similar to an endogenousgene DNA sequence in the plant. Other examples include when a transgenicplant is subsequently crossed to other transgenic plants having the sameor similar transgenes or when the transgenic plant is retransformed witha plant expression cassette that contains the same or similar gene.Similarly, gene silencing may occur if trait stacking employs the samegenetic elements used to direct expression of the transgene gene ofinterest. In order to stack traits, stable transgenic lines should bedone with different combinations of genes and genetic elements to avoidgene silencing.

N-phosphonomethylglycine, also known as glyphosate, is a well-knownherbicide that has activity on a broad spectrum of plant species.Glyphosate is the active ingredient of Roundup® (Monsanto Co.), a safeherbicide having a desirably short half-life in the environment. Whenapplied to a plant surface, glyphosate moves systemically through theplant. Glyphosate is phytotoxic due to its inhibition of the shikimicacid pathway, which provides a precursor for the synthesis of aromaticamino acids. Glyphosate inhibits the enzyme5-enolpyruvyl-3-phosphoshikimate synthase (EPSPS).

Glyphosate tolerance can also be achieved by the expression of EPSPSvariants that have lower affinity for glyphosate and therefore retaintheir catalytic activity in the presence of glyphosate (U.S. Pat. No.5,633,435, herein incorporated by reference). Enzymes that degradeglyphosate in plant tissues (U.S. Pat. No. 5,463,175) are also capableof conferring cellular tolerance to glyphosate. Such genes are used forthe production of transgenic crops that are tolerant to glyphosate,thereby allowing glyphosate to be used for effective weed control withminimal concern of crop damage. For example, glyphosate tolerance hasbeen genetically engineered into corn (U.S. Pat. No. 5,554,798), wheat(U.S. Patent Application No. 20020062503), soybean (U.S. PatentApplication No. 20020157139) and canola (WO 9204449), all of which areincorporated by reference. The transgenes for glyphosate tolerance andthe transgenes for tolerance to other herbicides, e.g. bar gene, (Tokiet al. Plant Physiol., 100:1503-1507, 1992; Thompson et al. EMBO J.6:2519-2523, 1987, phosphinothricin acetyltransferase, BAR gene isolatedfrom Streptomyces; DeBlock et al. EMBO J., 6:2513-2522, 1987,glufosinate herbicide) are also useful as selectable markers or scorablemarkers and can provide a useful phenotype for selection of plantslinked with other agronomically useful traits.

What is needed in the art are methods to design genes for expression inplants to improve agronomically useful traits that avoid gene silencingwhen multiple copies are inserted and recombination with endogenousplant genes.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1. Pileup comparison of the polynucleotide sequences changes of twoartificial rice EPSPS versions (OsEPSPS_AT, OsEPSPS_ZM) and a nativerice EPSPS (OsEPSPS_Nat) the polypeptide of each modified to beglyphosate resistant.

FIG. 2. Pileup comparison of the polynucleotide sequences of a native(ZmEPSPS_Nat) and an artificial corn EPSPS (ZmEPSPS_ZM) the polypeptideof each modified to be glyphosate resistant.

FIG. 3. Pileup comparison of the polynucleotide sequences of a soybeannative EPSPS (GmEPSPS_Nat) and artificial version (GmEPSPS_GM) thepolypeptide of each modified to be glyphosate resistant.

FIG. 4. Pileup comparison of the polynucleotide sequences of a nativeBAR gene (BAR1_Nat) and two artificial versions with Zea mays (BAR1_ZM)and Arabidopsis thaliana (BAR1_AT) codon bias.

FIG. 5. Pileup comparison of the polynucleotide sequences of CTP2 andCP4EPSPS native (CTP2CP4_Nat) and artificial versions (CTP2CP4_AT,CTP2CP4_ZM, and CTP2CP4_GM).

FIG. 6. Plasmid map of pMON54949.

FIG. 7. Plasmid map of pMON54950.

FIG. 8. Plasmid map of pMON30151.

FIG. 9. Plasmid map of pMON59302.

FIG. 10. Plasmid map of pMON59307.

FIG. 11. Plasmid map of pMON42411.

FIG. 12. Plasmid map of pMON58400.

FIG. 13. Plasmid map of pMON58401.

FIG. 14. Plasmid map of pMON54964.

FIG. 15. Plasmid map of pMON25455.

FIG. 16. Plasmid map of pMON30152.

FIG. 17. Plasmid map of pMON54992.

FIG. 18. Plasmid map of pMON54985.

FIG. 19. Plasmid map of pMON20999.

FIG. 20. Plasmid map of pMON45313.

FIG. 21. Plasmid map of pMON59308.

FIG. 22. Plasmid map of pMON59309.

FIG. 23. Plasmid map of pMON59313.

FIG. 24. Plasmid map of pMON59396.

FIG. 25. Plasmid map of pMON25496.

BRIEF DESCRIPTION OF SEQUENCE LISTING SEQ ID OsEPSPS_TIPS A rice EPSPSprotein sequence mod- NO:1 ified to be glyphosate resistant, withchloroplast transit peptide. SEQ ID OsEPSPS_Nat Polynucleotide sequenceof a rice native NO:2 EPSPS polynucleotide modified to en- code aglyphosate resistant protein. SEQ ID OsEPSPS_AT Polynucleotide sequenceof an artifi- NO:3 cial rice EPSPS polynucleotide using the Arabidopsiscodon usage table and the methods of the present inven- tion, andfurther modified to encode a glyphosate resistant protein. SEQ IDOsEPSPS_ZM Polynucleotide sequence of an artifi- NO:4 cial rice EPSPSpolynucleotide using the Zea mays codon usage table and the methods ofthe present invention, and further modified to encode a glyphosateresistant protein. SEQ ID GmEPSPS_IKS A soybean EPSPS protein sequenceNO:5 modified to be glyphosate resistant, with chloroplast transitpeptide. SEQ ID GmEPSPS_Nat Polynucleotide sequence of a soy- NO:6 beannative EPSPS polynucleotide modified to encode a glyphosate resis- tantprotein. SEQ ID GmEPSPS_GM Polynucleotide sequence of an artifi- NO:7cial soybean EPSPS polynucleotide using the Glycine max codon usagetable and the methods of the present invention, and further modified toen- code a glyphosate resistant protein. SEQ ID ZmEPSPS_TIPS A cornEPSPS protein sequence modi- NO:8 fied to be glyphosate resistant, withchloroplast transit peptide. SEQ ID ZmEPSPS_Nat Polynucleotide sequenceof a corn NO:9 native EPSPS polynucleotide modi- fied to encode aglyphosate resistant protein. SEQ ID ZmEPSPS_ZM Polynucleotide sequenceof an artifi- NO:10 cial corn EPSPS polynucleotide using the Zea mayscodon usage table and the methods of the present invention, and furthermodified to encode a gly- phosate resistant protein. SEQ ID CTP2 Proteinsequence of the chloroplast NO:11 transit peptide 2 from ArabidopsisEPSPS gene. SEQ ID CTP2_Nat Polynucleotide sequence of the chloro- NO:12plast transit peptide from Arabidopsis EPSPS. SEQ ID CTP2_ATPolynucleotide sequence of an artifi- NO:13 cial polynucleotide encodingthe CTP2 using the Arabidopsis codon usage table and the methods of thepresent invention. SEQ ID CTP2_ZM Polynucleotide sequence of an artifi-NO:14 cial polynucleotide encoding the CTP2 using the Zea mays codonusage table and the methods of the present invention. SEQ ID CP4EPSPSThe protein sequence of the glyphosate NO:15 resistant EPSPS proteinfrom Agrobacterium strain CP4. SEQ ID CP4EPSPS_Nat Polynucleotidesequence of the native NO:16 polynucleotide encoding the CP4EPSPSprotein (U.S. Pat. No. 5,633,435). SEQ ID CP4EPSPS_AT Polynucleotidesequence of an artificial NO:17 polynucleotide encoding the CP4EPSPSprotein using the Arabidopsis codon usage table and the methods of thepresent invention. SEQ ID CP4EPSPS_ZM Polynucleotide sequence of anartificial NO:18 polynucleotide encoding the CP4EPSPS protein using theZea mays codon usage table and the methods of the present invention. SEQID BAR1 The protein sequence of a phosphino- NO:19 thricinacetyltransferase. SEQ ID BAR1_Nat Polynucleotide sequence of the nativeNO:20 polynucleotide isolated from Streptomyces encoding the phosphino-thricin acetyltransferase. SEQ ID BAR1_AT Polynucleotide sequence of anartificial NO:21 polynucleotide encoding the phosphino- thricinacetyltransferase using the Arabidopsis codon usage table and themethods of the present invention. SEQ ID BAR1_ZM Polynucleotide sequenceof an artificial NO:22 polynucleotide encoding the phosphino- thricinacetyltransferase using the Zea mays codon usage table and the me- thodsof the present invention. SEQ ID CP4EPSPS_Syn Polynucleotide sequence ofan artificial NO:23 polynucleotide with dicot codon bias. SEQ IDCP4EPSPS_AT_p1 DNA primer molecule diagnostic for the NO:24 CP4EPSPS_ATpolynucleotide. SEQ ID CP4EPSPS_AT_p2 DNA primer molecule diagnostic forthe NO:25 CP4EPSPS_AT polynucleotide. SEQ ID CP4EPSPS_ZM_p1 DNA primermolecule diagnostic for the NO:26 CP4EPSPS_ZM polynucleotide. SEQ IDCP4EPSPS_ZM_p2 DNA primer molecule diagnostic for the NO:27 CP4EPSPS_ZMpolynucleotide. SEQ ID CP4EPSPS_Nat_p1 DNA primer molecule diagnosticfor the NO:28 CP4EPSPS_Nat polynucleotide. SEQ ID CP4EPSPS_Nat_p2 DNAprimer molecule diagnostic for the NO:29 CP4EPSPS_Nat polynucleotide.SEQ ID CP4EPSPS_Syn_p1 DNA primer molecule diagnostic for the NO:30CP4EPSPS_Syn polynucleotide. SEQ ID CP4EPSPS_Syn_p2 DNA primer moleculediagnostic for the NO:31 CP4EPSPS_Syn polynucleotide. SEQ ID ZmAdh1primer1 Control primer 1 diagnostic for endo- NO:32 genous corn Adh1gene. SEQ ID ZmAdh1 primer2 Control primer 2 diagnostic for endo- NO:33genous corn Adh1 gene. SEQ ID GNAGIAMKS Motif providing glyphosateresistance to NO:34 a plant EPSPS. SEQ ID CTPEPSPSCP4_GM Polynucleotidesequence of an artificial NO:35 polynucleotide encoding the CP4EPSPSprotein using the Glycine max codon usage table.

SUMMARY OF THE INVENTION

The present invention provides methods and compositions to design anartificial polynucleotide sequence that encodes a protein of interest,wherein the artificial polynucleotide is substantially divergent from apolynucleotide naturally occurring in a plant or a polynucleotide thathas been introduced as a transgene into a plant and the artificialpolynucleotide and polynucleotide encode a substantially identicalpolypeptide.

The artificial polynucleotides of the present invention that encodesproteins that provide agronomically useful phenotypes to a transgenicplant containing a DNA construct comprising the artificialpolynucleotide. The agronomically useful phenotypes include, but are notlimited to: drought tolerance, increased yield, cold tolerance, diseaseresistance, insect resistance and herbicide tolerance.

Another aspect of the present invention are artificial polynucleotidesthat encode a herbicide resistant EPSPS protein, a phosphinothricinacetyltransferase protein, a chloroplast transit peptide protein. Inpreferred embodiments of the present invention, the artificialpolynucleotide molecule is selected from the group consisting of SEQ IDNO:3, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:10, SEQ ID NO:13,SEQ ID NO:14, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:21, SEQ ID NO:22,and SEQ ID NO:35.

The present invention provides DNA constructs comprising: a promotermolecule that functions in plants, operably linked to an artificialpolynucleotide molecule of the present invention, wherein the artificialpolynucleotide molecule is selected from the group consisting of SEQ IDNO:3, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:10, SEQ ID NO:13,SEQ ID NO:14, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:21, SEQ ID NO:22,and SEQ ID NO:35, operably linked to a transcription termination region.

The present invention further provides DNA constructs comprising: apromoter molecule that functions in plants, operably linked to anartificial polynucleotide molecule that encodes a chloroplast transitpeptide, operably linked to a heterologous glyphosate resistant EPSPS,operably linked to a transcription termination signal region, whereinthe artificial polynucleotide is substantially divergent inpolynucleotide sequence from known polynucleotides encoding an identicalchloroplast transit peptide.

The present invention provides DNA constructs comprising at least twoexpression cassettes, the first expression cassette comprising apromoter molecule that functions in plants, operably linked to anartificial polynucleotide molecule of the present invention, operablylinked to a transcription termination signal region, and the secondexpression cassette comprising a promoter molecule that functions inplants, operably linked to a polynucleotide molecule that encodes asubstantially identical polypeptide as said artificial polynucleotideand is less than eight-five percent similar in polynucleotide sequenceto said artificial polynucleotide, operably linked to a transcriptiontermination signal region.

The present invention provides plant cells, plants or progeny thereofcomprising a DNA construct of the present invention. Of particularinterest are plants of progeny thereof selected from the groupconsisting of wheat, corn, rice, soybean, cotton, potato, canola, turfgrass, forest trees, grain sorghum, vegetable crops, ornamental plants,forage crops, and fruit crops.

A method of the present invention reduces gene silencing during breedingof transgenic plants comprising the steps of:

a) constructing an artificial polynucleotide that is substantiallydivergent from known polynucleotides that encode a substantiallyidentical protein, and

b) constructing a DNA construct containing said artificialpolynucleotide molecule; and

c) transforming said DNA construct into a plant cell; and

d) regenerating said plant cell into a transgenic plant; and

e) crossing said transgenic plant with a fertile plant, wherein saidfertile plant contains a polynucleotide molecule that encodes a proteinsubstantially identical to a protein encoded by said artificialpolynucleotide molecule and wherein said artificial polynucleotidemolecule and said polynucleotide molecule are substantially divergent.

Another aspect of the invention is a transgenic plant cell comprisingtwo polynucleotides, wherein at least one of the polynucleotides is atransgene and the two polynucleotides encode a substantially identicalprotein and are less than eight-five percent similar in polynucleotidesequence.

Another aspect of the present invention in a method to reduce genesilencing during production of transgenic plants comprises the steps of:

a) constructing an artificial polynucleotide that is substantiallydivergent from known polynucleotides that encode a substantiallyidentical protein, and

b) constructing a first DNA construct containing said artificialpolynucleotide molecule; and

c) transforming said DNA construct into a plant cell; and

d) regenerating said plant cell into a transgenic plant; and

e) retransforming a cell from said transgenic plant with a second DNAconstruct comprising a polynucleotide molecule that encodes asubstantially identical protein to said artificial polynucleotide andsaid polynucleotide and artificial polynucleotide are substantiallydivergent in polynucleotide sequence; and

f) regenerating said cell of step d into a transgenic plant comprisingboth said artificial polynucleotide and said polynucleotide.

Further provided by the present invention are methods for selection of aplants transformed with a DNA construct of the invention comprising thesteps of:

a) transforming a plant cell with a DNA construct of the presentinvention; and

b) culturing said plant cell in a selective medium containing aherbicide selected from the group consisting of: glyphosate andglufosinate, to selectively kill cells which have not been transformedwith said DNA constructs; and

c) regenerating said plant cell into a fertile plant.

Another aspect of the invention is a method of detecting an artificialpolynucleotide in a transgenic plant cell, plant or progeny thereofcomprising the steps:

a) contacting a DNA sample isolated from said plant cell, plant orprogeny thereof with a DNA molecule, wherein said DNA molecule comprisesat least one DNA molecule of a pair of DNA molecules that when used in anucleic-acid amplification reaction produces an amplicon that isdiagnostic for said artificial polynucleotide molecule selected from thegroup consisting of: SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:7,SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:17, SEQ ID NO:18,SEQ ID NO:21, SEQ ID NO:22, and SEQ ID NO:35.

(a) performing a nucleic acid amplification reaction, thereby producingthe amplicon; and

(b) detecting the amplicon.

Reagents provided for performing the detection method above include, butare not limited to: DNA molecules that specifically hybridize to anartificial polynucleotide molecule selected from the group consistingof: SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:10,SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:21,and SEQ ID NO:22; and isolated DNA molecules selected from the groupconsisting of: SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, and SEQ IDNO:27.

The present invention provides plants, and progeny comprising a DNAmolecule selected from the group consisting of: SEQ ID NO:3, SEQ IDNO:4, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:10, SEQ ID NO:13, SEQ IDNO:14, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:21 SEQ ID NO:22, SEQ IDNO:24, SEQ ID NO:25, SEQ ID NO:26, and SEQ ID NO:27.

The present invention provides pairs of DNA molecules selected from thegroup comprising: a first DNA molecule and a second DNA molecule,wherein the first DNA molecule is SEQ ID NO:24 or its complement and thesecond DNA molecule is SEQ ID NO:25 or its complement and the pair ofDNA molecules when used in a DNA amplification method produce anamplicon, and a first DNA molecule and a second DNA molecule, whereinthe first DNA molecule is SEQ ID NO:26 or its complement and the secondDNA molecule is SEQ ID NO:27 or its complement and the pair of DNAmolecules when used in a DNA amplification method produce an amplicon,wherein the amplicon is diagnostic for the presence of an artificialpolynucleotide of the present invention in the genome of a transgenicplant.

The present invention provides for a plant and progeny thereofidentified by a DNA amplification method to contain in its genome a DNAmolecule selected from the group consisting of: SEQ ID NO:3, SEQ IDNO:4, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:10, SEQ ID NO:13, SEQ IDNO:14, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:21 SEQ ID NO:22, SEQ IDNO:24, SEQ ID NO:25, SEQ ID NO:26, and SEQ ID NO:27.

The present invention provides and contemplates DNA detection kitscomprising: at least one DNA molecule of sufficient length to bespecifically homologous or complementary to an artificial polynucleotideselected from the group consisting of SEQ ID NO:3, SEQ ID NO:4, SEQ IDNO:6, SEQ ID NO:7, SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:14, SEQ IDNO:17, SEQ ID NO:18, SEQ ID NO:21, and SEQ ID NO:22, wherein said DNAmolecule is useful as a DNA probe or DNA primer; or at least one DNAmolecule homologous or complementary to a DNA primer molecule selectedfrom the group consisting of: SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26,and SEQ ID NO:27.

The present invention further provides a method of detecting thepresence of an artificial polynucleotide encoding a glyphosate resistantEPSPS in a DNA sample, the method comprising:

-   -   (a) extracting a DNA sample from a plant; and    -   (b) contacting the DNA sample with a labeled DNA molecule of        sufficient length to be specifically homologous or complementary        to an artificial polynucleotide selected from the group        consisting of: SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:6, SEQ ID        NO:7, SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:17,        and SEQ ID NO:18, wherein said labeled DNA molecule is a DNA        probe; and    -   (c) subjecting the sample and DNA probe to stringent        hybridization conditions; and    -   (d) detecting the DNA probe hybridized to the DNA sample.

The present invention provides for an isolated polynucleotide thatencodes an EPSPS enzyme, the EPSPS enzyme contains the motif of SEQ IDNO:34. The present invention provides for a DNA construct containing apolynucleotide that encodes for the EPSPS enzyme with the motif of SEQID NO:34. A plant cell, plant or progeny thereof that is tolerant toglyphosate as a result of expressing an EPSPS enzyme that contains themotif of SEQ ID NO:34 is an aspect of the invention.

DETAILED DESCRIPTION OF THE INVENTION

The following definitions are provided to better define the presentinvention and to guide those of ordinary skill in the art in thepractice of the present invention. Unless otherwise noted, terms are tobe understood according to conventional usage by those of ordinary skillin the relevant art. Definitions of common terms in molecular biologymay also be found in Rieger et al., Glossary of Genetics: Classical andMolecular, 5th edition, Springer-Verlag: New York, (1991); and Lewin,Genes V, Oxford University Press: New York, (1994). The nomenclature forDNA bases as set forth at 37 CFR § 1.822 is used. The standard one- andthree-letter nomenclature for amino acid residues is used.

“Amino-acid substitutions”, “Amino-acid variants”, are preferablysubstitutions of single amino-acid residue for another amino-acidresidue at any position within the protein. Substitutions, deletions,insertions or any combination thereof can be combined to arrive at afinal construct.

An “artificial polynucleotide” as used in the present invention is a DNAsequence designed according to the methods of the present invention andcreated as an isolated DNA molecule for use in a DNA construct thatprovides expression of a protein in host cells, and for the purposes ofcloning into appropriate constructs or other uses known to those skilledin the art. Computer programs are available for these purposes,including but not limited to the “BestFit” or “Gap” programs of theSequence Analysis Software Package, Genetics Computer Group (GCG), Inc.,University of Wisconsin Biotechnology Center, Madison, Wis. 53711. Theartificial polynucleotide may be created by a one or more methods knownin the art, that include, but are not limited to: overlapping PCR. Anartificial polynucleotide of the present invention is substantiallydivergent from other polynucleotides that code for the identical ornearly identical protein.

The term “chimeric” refers to a fusion nucleic acid or protein sequence.A chimeric nucleic acid coding sequence is comprised of two or moresequences joined in-frame that encode a chimeric protein. A chimericgene refers to the multiple genetic elements derived from heterologoussources comprising a gene.

The phrases “coding sequence”, “open reading frame”, and “structuralsequence” refer to the region of continuous sequential nucleic acidtriplets encoding a protein, polypeptide, or peptide sequence.

“Codon” refers to a sequence of three nucleotides that specify aparticular amino acid.

“Codon usage” or “codon bias” refers to the frequency of use of codonsencoding amino acids in the coding sequences of organisms.

“Complementarity” and “complement” when referring to nucleic acidsequences, refers to the specific binding of adenine to thymine (uracilin RNA) and cytosine to guanine on opposite strands of DNA or RNA.

“Construct” refers to the heterologous genetic elements operably linkedto each other making up a recombinant DNA molecule and may compriseelements that provide expression of a DNA polynucleotide molecule in ahost cell and elements that provide maintenance of the construct.

“C-terminal region” refers to the region of a peptide, polypeptide, orprotein chain from the middle thereof to the end that carries the aminoacid having a free carboxyl group.

The term “divergent”, as used herein, refers to the comparison ofpolynucleotide molecules that encode the same or nearly the same proteinor polypeptide. The four letter genetic code (A, G, C, and T/U)comprises three letter codons that direct t-RNA molecules to assembleamino acids into a polypeptide from an mRNA template. Having more thanone codon that may code for the same amino acid is referred to asdegenerate. Degenerate codons are used to construct substantiallydivergent polynucleotide molecules that encode the same polypeptidewhere these molecules have a sequence of nucleotides of their entirelength in which they are less than 85% identical, and there are nolengths of polynucleotide sequence greater than 23 nucleotides that areidentical.

The term “encoding DNA” refers to chromosomal DNA, plasmid DNA, cDNA, orartificial DNA polynucleotide that encodes any of the proteins discussedherein.

The term “endogenous” refers to materials originating from within anorganism or cell.

“Endonuclease” refers to an enzyme that hydrolyzes double stranded DNAat internal locations.

“Exogenous” refers to materials originating from outside of an organismor cell. This typically applies to nucleic acid molecules used inproducing transformed or transgenic host cells and plants.

“Exon” refers to the portion of a gene that is actually translated intoprotein, i.e., a coding sequence.

The term “expression” refers to the transcription or translation of apolynucleotide to produce a corresponding gene product, a RNA orprotein.

“Fragments”. A fragment of a gene is a portion of a full-lengthpolynucleic acid molecule that is of at least a minimum length capableof transcription into a RNA, translation into a peptide, or useful as aprobe or primer in a DNA detection method.

The term “gene” refers to chromosomal DNA, plasmid DNA, cDNA, artificialDNA polynucleotide, or other DNA that encodes a peptide, polypeptide,protein, or RNA molecule, and the genetic elements flanking the codingsequence that are involved in the regulation of expression.

The term “genome” as it applies to viruses encompasses all of thenucleic acid sequence contained within the capsid of the virus. The term“genome” as it applies to bacteria encompasses both the chromosome andplasmids within a bacterial host cell. Encoding nucleic acids of thepresent invention introduced into bacterial host cells can therefore beeither chromosomally-integrated or plasmid-localized. The term “genome”as it applies to plant cells encompasses not only chromosomal DNA foundwithin the nucleus, but organelle DNA found within subcellularcomponents of the cell. Nucleic acids of the present inventionintroduced into plant cells can therefore be eitherchromosomally-integrated or organelle-localized.

“Glyphosate” refers to N-phosphonomethylglycine and its' salts,Glyphosate is the active ingredient of Roundup® herbicide (MonsantoCo.). Plant treatments with “glyphosate” refer to treatments with theRoundup® or Roundup Ultra® herbicide formulation, unless otherwisestated. Glyphosate as N-phosphonomethylglycine and its' salts (notformulated Roundup® herbicide) are components of synthetic culture mediaused for the selection of bacteria and plant tolerance to glyphosate orused to determine enzyme resistance in in vitro biochemical assays.

“Heterologous DNA” sequence refers to a polynucleotide sequence thatoriginates from a foreign source or species or, if from the same source,is modified from its original form.

“Homologous DNA” refers to DNA from the same source as that of therecipient cell.

“Hybridization” refers to the ability of a strand of nucleic acid tojoin with a complementary strand via base pairing. Hybridization occurswhen complementary sequences in the two nucleic acid strands bind to oneanother. The nucleic acid probes and primers of the present inventionhybridize under stringent conditions to a target DNA sequence. Anyconventional nucleic acid hybridization or amplification method can beused to identify the presence of DNA from a transgenic event in asample. Nucleic acid molecules or fragments thereof are capable ofspecifically hybridizing to other nucleic acid molecules under certaincircumstances. As used herein, two nucleic acid molecules are said to becapable of specifically hybridizing to one another if the two moleculesare capable of forming an anti-parallel, double-stranded nucleic acidstructure. A nucleic acid molecule is said to be the “complement” ofanother nucleic acid molecule if they exhibit complete complementarity.As used herein, molecules are said to exhibit “complete complementarity”when every nucleotide of one of the molecules is complementary to anucleotide of the other. Two molecules are said to be “minimallycomplementary” if they can hybridize to one another with sufficientstability to permit them to remain annealed to one another under atleast conventional “low-stringency” conditions. Similarly, the moleculesare said to be “complementary” if they can hybridize to one another withsufficient stability to permit them to remain annealed to one anotherunder conventional “high-stringency” conditions. Conventional stringencyconditions are described by Sambrook et al., 1989, and by Haymes et al.,In: Nucleic Acid Hybridization, A Practical Approach, IRL Press,Washington, D.C. (1985), herein incorporated by reference in itsentirety. Departures from complete complementarity are thereforepermissible, as long as such departures do not completely preclude thecapacity of the molecules to form a double-stranded structure. In orderfor a nucleic acid molecule to serve as a primer or probe it need onlybe sufficiently complementary in sequence to be able to form a stabledouble-stranded structure under the particular solvent and saltconcentrations employed.

As used herein, a substantially homologous sequence is a nucleic acidsequence that will specifically hybridize to the complement of thenucleic acid sequence to which it is being compared under highstringency conditions. The term “stringent conditions” is functionallydefined with regard to the hybridization of a nucleic-acid probe to atarget nucleic acid (i.e., to a particular nucleic-acid sequence ofinterest) by the specific hybridization procedure discussed in Sambrooket al., 1989, at 9.52-9.55. See also, Sambrook et al., 1989 at9.47-9.52, 9.56-9.58 herein incorporated by reference in its entirety;Kanehisa, (Nucl. Acids Res. 12:203-213, 1984, herein incorporated byreference in its entirety); and Wetmur and Davidson, (J. Mol. Biol.31:349-370, 1988, herein incorporated by reference in its entirety).Accordingly, the nucleotide sequences of the invention may be used fortheir ability to selectively form duplex molecules with complementarystretches of DNA fragments. Depending on the application envisioned, onewill desire to employ varying conditions of hybridization to achievevarying degrees of selectivity of probe towards target sequence. Forapplications requiring high selectivity, one will typically desire toemploy relatively stringent conditions to form the hybrids, e.g., onewill select relatively low salt and/or high temperature conditions, suchas provided by about 0.02 M to about 0.15 M NaCl at temperatures ofabout 50° C. to about 70° C. A stringent conditions, for example, is towash the hybridization filter at least twice with high-stringency washbuffer (0.2×SSC, 0.1% SDS, 65° C.). Appropriate stringency conditionswhich promote DNA hybridization, for example, 6.0× sodiumchloride/sodium citrate (SSC) at about 45° C., followed by a wash of2.0×SSC at 50° C., are known to those skilled in the art or can be foundin Current Protocols in Molecular Biology, John Wiley & Sons, N.Y.(1989), 6.3.1-6.3.6. For example, the salt concentration in the washstep can be selected from a low stringency of about 2.0×SSC at 50° C. toa high stringency of about 0.2×SSC at 50° C. In addition, thetemperature in the wash step can be increased from low stringencyconditions at room temperature, about 22° C., to high stringencyconditions at about 65° C. Both temperature and salt may be varied, oreither the temperature or the salt concentration may be held constantwhile the other variable is changed. Such selective conditions toleratelittle, if any, mismatch between the probe and the template or targetstrand. Detection of DNA sequences via hybridization is well-known tothose of skill in the art, and the teachings of U.S. Pat. Nos. 4,965,188and 5,176,995 are exemplary of the methods of hybridization analyses.

“Identity” refers to the degree of similarity between two polynucleicacid or protein sequences. An alignment of the two sequences isperformed by a suitable computer program. A widely used and acceptedcomputer program for performing sequence alignments is CLUSTALW v1.6(Thompson, et al. Nucl. Acids Res., 22: 4673-4680, 1994). The number ofmatching bases or amino acids is divided by the total number of bases oramino acids, and multiplied by 100 to obtain a percent identity. Forexample, if two 580 base pair sequences had 145 matched bases, theywould be 25 percent identical. If the two compared sequences are ofdifferent lengths, the number of matches is divided by the shorter ofthe two lengths. For example, if there are 100 matched amino acidsbetween a 200 and a 400 amino acid protein, they are 50 percentidentical with respect to the shorter sequence. If the shorter sequenceis less than 150 bases or 50 amino acids in length, the number ofmatches are divided by 150 (for nucleic acid bases) or 50 (for aminoacids), and multiplied by 100 to obtain a percent identity.

As described herein a protein can be “substantially identical” torelated proteins. These proteins with substantial identity generallycomprise at least one polypeptide sequence that has at leastninety-eight sequence percent identity compared to its related otherpolypeptide sequence. The Gap program in the WISCONSIN PACKAGE version10.0-UNIX from Genetics Computer Group, Inc. is based on the method ofNeedleman and Wunsch (J. Mol. Biol. 48:443-453, 1970) using the set ofdefault parameters for pairwise comparison (for amino acid sequencecomparison: Gap Creation Penalty=8, Gap Extension Penalty=20); or usingthe TBLASTN program in the BLAST 2.2.1 software suite (Altschul et al.,Nucleic Acids Res. 25:3389-3402), using BLOSUM62 matrix (Henikoff andHenikoff, Proc. Natl. Acad. Sci. U.S.A. 89:10915-10919, 1992) and theset of default parameters for pair-wise comparison (gap creationcost=11, gap extension cost=1.). In BLAST, the E-value, or expectationvalue, represents the number of different alignments with scoresequivalent to or better than the raw alignment score, S, that areexpected to occur in a database search by chance. The lower the E value,the more significant the match. Because database size is an element inE-value calculations, E-values obtained by “BLASTing” against publicdatabases, such as GenBank, have generally increased over time for anygiven query/entry match. Percent identity refers to the percentage ofidentically matched amino acid residues that exist along the length ofthat portion of the sequences which is aligned by the BLAST algorithm.

“Intron” refers to a portion of a gene not translated into protein, eventhough it is transcribed into RNA.

An “isolated” nucleic acid sequence is substantially separated orpurified away from other nucleic acid sequences with which the nucleicacid is normally associated in the cell of the organism in which thenucleic acid naturally occurs, i.e., other chromosomal orextrachromosomal DNA. The term embraces nucleic acids that arebiochemically purified so as to substantially remove contaminatingnucleic acids and other cellular components. The term also embracesrecombinant nucleic acids and chemically synthesized nucleic acids.

“Isolated,” “Purified,” “Homogeneous” polypeptides. A polypeptide is“isolated” if it has been separated from the cellular components(nucleic acids, lipids, carbohydrates, and other polypeptides) thatnaturally accompany it or that is chemically synthesized or recombinant.A monomeric polypeptide is isolated when at least 60% by weight of asample is composed of the polypeptide, preferably 90% or more, morepreferably 95% or more, and most preferably more than 99%. Proteinpurity or homogeneity is indicated, for example, by polyacrylamide gelelectrophoresis of a protein sample, followed by visualization of asingle polypeptide band upon staining the polyacrylamide gel; highpressure liquid chromatography; or other conventional methods. Proteinscan be purified by any of the means known in the art, for example asdescribed in Guide to Protein Purification, ed. Deutscher, Meth.Enzymol. 185, Academic Press, San Diego, 1990; and Scopes, ProteinPurification: Principles and Practice, Springer Verlag, New York, 1982.

“Labeling” or “labeled”. There are a variety of conventional methods andreagents for labeling polynucleotides and polypeptides and fragmentsthereof. Typical labels include radioactive isotopes, ligands or ligandreceptors, fluorophores, chemiluminescent agents, and enzymes. Methodsfor labeling and guidance in the choice of labels appropriate forvarious purposes are discussed, e.g., in Sambrook et al., MolecularCloning: A Laboratory Manual, Cold Spring Harbor Press (1989) andCurrent Protocols in Molecular Biology, ed. Ausubel et al., GreenePublishing and Wiley-Interscience, New York, (1992).

“Mature protein coding region”, this term refers to the sequence of aprocessed protein product, i.e., a mature EPSP synthase remaining afterthe chloroplast transit peptide has been removed.

“Native”, the term “native” generally refers to a naturally-occurring(“wild-type”) polynucleic acid or polypeptide. However, in the contextof the present invention, some modification of an isolatedpolynucleotide and polypeptide may have occurred to provide apolypeptide with a particular phenotype, e.g., amino acid substitutionin glyphosate sensitive EPSPS to provide a glyphosate resistant EPSPS.For comparative purposes in the present invention, the isolatedpolynucleotide that contains a few substituted nucleotides to provideamino acid modification for herbicide tolerance is referred to as the“native” polynucleotide when compared to the substantially divergentpolynucleotide created by the methods of the present invention. However,the “native” polynucleotide modified in this manner is normative withrespect to the genetic elements normally found linked to a naturallyoccurring unmodified polynucleotide.

“N-terminal region” refers to a region of a peptide, polypeptide, orprotein chain from the amino acid having a free amino group to themiddle of the chain.

“Nucleic acid” refers to deoxyribonucleic acid (DNA) and ribonucleicacid (RNA).

Nucleic acid codes: A=adenosine; C=cytosine; G=guanosine; T=thymidine.Codes used for synthesis of oligonucleotides: N=equimolar A, C, G, andT; I=deoxyinosine; K=equimolar G and T; R=equimolar A and G; S=equimolarC and G; W=equimolar A and T; Y=equimolar C and T.

A “nucleic acid segment” or a “nucleic acid molecule segment” is anucleic acid molecule that has been isolated free of total genomic DNAof a particular species, or that has been synthesized. Included with theterm “nucleic acid segment” are DNA segments, recombinant vectors,plasmids, cosmids, phagemids, phage, viruses, et cetera.

“Nucleotide Sequence Variants”, using well-known methods, the skilledartisan can readily produce nucleotide and amino acid sequence variantsof genes and proteins, respectively. For example, “variant” DNAmolecules of the present invention are DNA molecules containing changesin an EPSPS gene sequence, i.e., changes that include one or morenucleotides of the EPSPS gene sequence is deleted, added, and/orsubstituted, such that the variant EPSPS gene encodes a protein thatretains EPSPS activity. Variant DNA molecules can be produced, forexample, by standard DNA mutagenesis techniques or by chemicallysynthesizing the variant DNA molecule or a portion thereof. Methods forchemical synthesis of nucleic acids are discussed, for example, inBeaucage et al., Tetra. Letts. 22:1859-1862 (1981), and Matteucci etal., J. Am. Chem. Soc. 103:3185-(1981). Chemical synthesis of nucleicacids can be performed, for example, on automated oligonucleotidesynthesizers. Such variants preferably do not change the reading frameof the protein-coding region of the nucleic acid and preferably encode aprotein having no change, or only a minor reduction.

“Open reading frame (ORF)” refers to a region of DNA or RNA encoding apeptide, polypeptide, or protein.

“Operably Linked”. A first nucleic-acid sequence is “operably” linkedwith a second nucleic-acid sequence when the first nucleic-acid sequenceis placed in a functional relationship with the second nucleic-acidsequence. For example, a promoter is operably linked to a protein-codingsequence if the promoter effects the transcription or expression of thecoding sequence. Generally, operably linked DNA sequences are contiguousand, where necessary to join two protein-coding regions, in readingframe.

“Overexpression” refers to the expression of a RNA or polypeptide orprotein encoded by a DNA introduced into a host cell, wherein the RNA orpolypeptide or protein is either not normally present in the host cell,or wherein the RNA or polypeptide or protein is present in said hostcell at a higher level than that normally expressed from the endogenousgene encoding the RNA or polypeptide or protein.

The term “plant” encompasses any higher plant and progeny thereof,including monocots (e.g., corn, rice, wheat, barley, etc.), dicots(e.g., soybean, cotton, canola, tomato, potato, Arabidopsis, tobacco,etc.), gymnosperms (pines, firs, cedars, etc.) and includes parts ofplants, including reproductive units of a plant (e.g., seeds, bulbs,tubers, fruit, flowers, etc.) or other parts or tissues from that theplant can be reproduced.

“Plant expression cassette” refers to chimeric DNA segments comprisingthe regulatory elements that are operably linked to provide theexpression of a transgene product in plants

“Plasmid” refers to a circular, extrachromosomal, self-replicating pieceof DNA.

“Polyadenylation signal” or “polyA signal” refers to a nucleic acidsequence located 3′ to a coding region that causes the addition ofadenylate nucleotides to the 3′ end of the mRNA transcribed from thecoding region.

“Polymerase chain reaction (PCR)” refers to a DNA amplification methodthat uses an enzymatic technique to create multiple copies of onesequence of nucleic acid (amplicon). Copies of a DNA molecule areprepared by shuttling a DNA polymerase between two amplimers. The basisof this amplification method is multiple cycles of temperature changesto denature, then re-anneal amplimers (DNA primer molecules), followedby extension to synthesize new DNA strands in the region located betweenthe flanking amplimers. Nucleic-acid amplification can be accomplishedby any of the various nucleic-acid amplification methods known in theart, including the polymerase chain reaction (PCR). A variety ofamplification methods are known in the art and are described, interalia, in U.S. Pat. Nos. 4,683,195 and 4,683,202 and in PCR Protocols: AGuide to Methods and Applications, ed. Innis et al., Academic Press, SanDiego, 1990. PCR amplification methods have been developed to amplify upto 22 kb of genomic DNA and up to 42 kb of bacteriophage DNA (Cheng etal., Proc. Natl. Acad. Sci. USA 91:5695-5699, 1994). These methods aswell as other methods known in the art of DNA amplification may be usedin the practice of the present invention.

Polynucleotide refers to a length of deoxyribonucleic acid (DNA) andribonucleic acid (RNA) molecules greater than two, which are connectedto form a larger molecule.

Polypeptide fragments. The present invention also encompasses fragmentsof a protein that lacks at least one residue of a native full-lengthprotein, but that substantially maintains activity of the protein.

The term “promote” or “promoter region” refers to a polynucleic acidmolecule that functions as a regulatory element, usually found upstream(5′) to a coding sequence, that controls expression of the codingsequence by controlling production of messenger RNA (mRNA) by providingthe recognition site for RNA polymerase and/or other factors necessaryfor start of transcription at the correct site. As contemplated herein,a promoter or promoter region includes variations of promoters derivedby means of ligation to various regulatory sequences, random orcontrolled mutagenesis, and addition or duplication of enhancersequences. The promoter region disclosed herein, and biologicallyfunctional equivalents thereof, are responsible for driving thetranscription of coding sequences under their control when introducedinto a host as part of a suitable recombinant vector, as demonstrated byits ability to produce mRNA.

“Recombinant”. A “recombinant” nucleic acid is made by a combination oftwo otherwise separated segments of sequence, e.g., by chemicalsynthesis or by the manipulation of isolated segments of nucleic acidsby genetic engineering techniques.

The term “recombinant DNA construct” or “recombinant vector” refers toany agent such as a plasmid, cosmid, virus, autonomously replicatingsequence, phage, or linear or circular single-stranded ordouble-stranded DNA or RNA nucleotide sequence, derived from any source,capable of genomic integration or autonomous replication, comprising aDNA molecule that one or more DNA sequences have been linked in afunctionally operative manner. Such recombinant DNA constructs orvectors are capable of introducing a 5′ regulatory sequence or promoterregion and a DNA sequence for a selected gene product into a cell insuch a manner that the DNA sequence is transcribed into a functionalmRNA that is translated and therefore expressed. Recombinant DNAconstructs or recombinant vectors may be constructed to be capable ofexpressing antisense RNAs, in order to inhibit translation of a specificRNA of interest.

“Regeneration” refers to the process of growing a plant from a plantcell (e.g., plant protoplast or explant).

“Reporter” refers to a gene and corresponding gene product that whenexpressed in transgenic organisms produces a product detectable bychemical or molecular methods or produces an observable phenotype.

“Resistance” refers to an enzyme that is able to function in thepresence of a toxin, for example, glyphosate resistant class II EPSPsynthases. An enzyme that has resistance to a toxin may have thefunction of detoxifying the toxin, e.g., the phosphinothricinacetyltransferase, glyphosate oxidoreductase, or may be a mutant enzymehaving catalytic activity which is unaffected by an herbicide whichdisrupts the same activity in the wild type enzyme, e.g., acetolactatesynthase, mutant class I EPSP synthases.

“Restriction enzyme” refers to an enzyme that recognizes a specificpalindromic sequence of nucleotides in double stranded DNA and cleavesboth strands; also called a restriction endonuclease. Cleavage typicallyoccurs within the restriction site.

“Selectable marker” refers to a polynucleic acid molecule that encodes aprotein, which confers a phenotype facilitating identification of cellscontaining the polynucleic acid molecule. Selectable markers includethose genes that confer resistance to antibiotics (e.g., ampicillin,kanamycin), complement a nutritional deficiency (e.g., uracil,histidine, leucine), or impart a visually distinguishing characteristic(e.g., color changes or fluorescence). Useful dominant selectable markergenes include genes encoding antibiotic resistance genes (e.g., neomycinphosphotransferase, aad); and herbicide resistance genes (e.g.,phosphinothricin acetyltransferase, class II EPSP synthase, modifiedclass I EPSP synthase). A useful strategy for selection of transformantsfor herbicide resistance is described, e.g., in Vasil, Cell Culture andSomatic Cell Genetics of Plants, Vols. I-III, Laboratory Procedures andTheir Applications Academic Press, New York (1984).

The term “specific for (a target sequence)” indicates that a DNA probeor DNA primer hybridizes under given hybridization conditions only tothe target sequence in a sample comprising the target sequence.

The term “substantially purified”, as used herein, refers to a moleculeseparated from other molecules normally associated with it in its nativestate. More preferably, a substantially purified molecule is thepredominant species present in a preparation. A substantially purifiedmolecule may be greater than 60% free, preferably 75% free, morepreferably 90% free from the other molecules (exclusive of solvent)present in the natural mixture. The term “substantially purified” is notintended to encompass molecules present in their native state.

“Tolerant” or “tolerance” refers to a reduced effect of a biotic orabiotic agent on the growth and development of organisms and plants,e.g. a pest or a herbicide.

“Transcription” refers to the process of producing an RNA copy from aDNA template.

“Transformation” refers to a process of introducing an exogenouspolynucleic acid molecule (e.g., a DNA construct, a recombinantpolynucleic acid molecule) into a cell or protoplast and that exogenouspolynucleic acid molecule is incorporated into a chromosome or iscapable of autonomous replication.

“Transformed” or “transgenic” refers to a cell, tissue, organ, ororganism into which a foreign polynucleic acid, such as a DNA vector orrecombinant polynucleic acid molecule. A “transgenic” or “transformed”cell or organism also includes progeny of the cell or organism andprogeny produced from a breeding program employing such a “transgenic”plant as a parent in a cross and exhibiting an altered phenotyperesulting from the presence of the foreign polynucleic acid molecule.

The term “transgene” refers to any polynucleic acid molecule normativeto a cell or organism transformed into the cell or organism. “Transgene”also encompasses the component parts of a native plant gene modified byinsertion of a normative polynucleic acid molecule by directedrecombination or site specific mutation.

“Transit peptide” or “targeting peptide” molecules, these termsgenerally refer to peptide molecules that when linked to a protein ofinterest directs the protein to a particular tissue, cell, subcellularlocation, or cell organelle. Examples include, but are not limited to,chloroplast transit peptides, nuclear targeting signals, and vacuolarsignals. The chloroplast transit peptide is of particular utility in thepresent invention to direct expression of the EPSPS enzyme to thechloroplast.

The term “translation” refers to the production the corresponding geneproduct, i.e., a peptide, polypeptide, or protein from a mRNA.

“Vector” refers to a plasmid, cosmid, bacteriophage, or virus thatcarries foreign DNA into a host organism.

Polynucleotides

Methods of the present invention include designing genes that confer atrait of interest to the plant into which they are introduced. Thetransgenes of agronomic interest that provide beneficial agronomictraits to crop plants, for example, including, but not limited togenetic elements comprising herbicide resistance (U.S. Pat. No.5,633,435; U.S. Pat. No. 5,463,175), increased yield (U.S. Pat. No.5,716,837), insect control (U.S. Pat. No. 6,063,597; U.S. Pat. No.6,063,756; U.S. Pat. No. 6,093,695; U.S. Pat. No. 5,942,664; U.S. Pat.No. 6,110,464), fungal disease resistance (U.S. Pat. No. 5,516,671; U.S.Pat. No. 5,773,696; U.S. Pat. No. 6,121,436; and U.S. Pat. No.6,316,407, and U.S. Pat. No. 6,506,962), virus resistance (U.S. Pat. No.5,304,730 and U.S. Pat. No. 6,013,864), nematode resistance (U.S. Pat.No. 6,228,992), bacterial disease resistance (U.S. Pat. No. 5,516,671),starch production (U.S. Pat. No. 5,750,876 and U.S. Pat. No. 6,476,295),modified oils production (U.S. Pat. No. 6,444,876), high oil production(U.S. Pat. No. 5,608,149 and U.S. Pat. No. 6,476,295), modified fattyacid content (U.S. Pat. No. 6,537,750), high protein production (U.S.Pat. No. 6,380,466), fruit ripening (U.S. Pat. No. 5,512,466), enhancedanimal and human nutrition (U.S. Pat. No. 5,985,605 and U.S. Pat. No.6,171,640), biopolymers (U.S. Pat. No. 5,958,745 and US PatentPublication No. US20030028917), environmental stress resistance (U.S.Pat. No. 6,072,103), pharmaceutical peptides (U.S. Pat. No. 6,080,560),improved processing traits (U.S. Pat. No. 6,476,295), improveddigestibility (U.S. Pat. No. 6,531,648) low raffinose (U.S. Pat. No.6,166,292), industrial enzyme production (U.S. Pat. No. 5,543,576),improved flavor (U.S. Pat. No. 6,011,199), nitrogen fixation (U.S. Pat.No. 5,229,114), hybrid seed production (U.S. Pat. No. 5,689,041), andbiofuel production (U.S. Pat. No. 5,998,700), the genetic elements andtransgenes described in the patents listed above are herein incorporatedby reference.

Herbicides for which transgenic plant tolerance has been demonstratedand the method of the present invention can be applied, include but arenot limited to: glyphosate, glufosinate, sulfonylureas, imidazolinones,bromoxynil, delapon, cyclohezanedione, protoporphyrionogen oxidaseinhibitors, and isoxasflutole herbicides. Polynucleotide moleculesencoding proteins involved in herbicide tolerance are known in the art,and include, but are not limited to a polynucleotide molecule encoding5-enolpyruvylshikimate-3-phosphate synthase (EPSPS, described in U.S.Pat. Nos. 5,627,061, 5,633,435, 6,040,497; Padgette et al. HerbicideResistant Crops, Lewis Publishers, 53-85, 1996; and Penaloza-Vazquez, etal. Plant Cell Reports 14:482-487, 1995; and aroA (U.S. Pat. No.5,094,945) for glyphosate tolerance; bromoxynil nitrilase (Bxn) forBromoxynil tolerance (U.S. Pat. No. 4,810,648); phytoene desaturase(crtI, Misawa et al, (1993) Plant J. 4:833-840, and (1994) Plant J.6:481-489); for tolerance to norflurazon, acetohydroxyacid synthase(AHAS, aka ALS, Sathasiivan et al. Nucl. Acids Res. 18:2188-2193, 1990);and the bar gene for tolerance to glufosinate and bialaphos (DeBlock, etal. EMBO J. 6:2513-2519, 1987).

Herbicide tolerance is a desirable phenotype for crop plants.N-phosphonomethylglycine, also known as glyphosate, is a well knownherbicide that has activity on a broad spectrum of plant species.Glyphosate is the active ingredient of Roundup® (Monsanto Co.), a safeherbicide having a desirably short half life in the environment. Whenapplied onto a plant surface, glyphosate moves systemically through theplant. Glyphosate is toxic to plants by inhibiting the shikimic acidpathway, which provides a precursor for the synthesis of aromatic aminoacids. Specifically, glyphosate affects the conversion ofphosphoenolpyruvate and 3-phosphoshikimic acid to5-enolpyruvyl-3-phosphoshikimic acid by inhibiting the enzyme5-enolpyruvyl-3-phosphoshikimate synthase (hereinafter referred to asEPSP synthase or EPSPS). For purposes of the present invention, the termglyphosate” should be considered to include any herbicidally effectiveform of N-phosphonomethylglycine (including any salt thereof) and otherforms which result in the production of the glyphosate anion in planta.

Through plant genetic engineering methods, it is possible to produceglyphosate tolerant plants by inserting into the plant genome a DNAmolecule that causes the production of higher levels of wild-type EPSPS(Shah et al., Science 233:478-481, 1986). Glyphosate tolerance can alsobe achieved by the expression of EPSPS variants that have lower affinityfor glyphosate and therefore retain their catalytic activity in thepresence of glyphosate (U.S. Pat. No. 5,633,435). Enzymes that degradeglyphosate in the plant tissues (U.S. Pat. No. 5,463,175) are alsocapable of conferring cellular tolerance to glyphosate. Such genes,therefore, allow for the production of transgenic crops that aretolerant to glyphosate, thereby allowing glyphosate to be used foreffective weed control with minimal concern of crop damage. For example,glyphosate tolerance has been genetically engineered into corn (U.S.Pat. Nos. 5,554,798, 6,040,497), wheat (Zhou et al. Plant Cell Rep.15:159-163, 1995), soybean (WO 9200377) and canola (WO 9204449).

Variants of the wild-type EPSPS enzyme have been isolated that areglyphosate-resistant as a result of alterations in the EPSPS amino acidcoding sequence (Kishore et al., Annu. Rev. Biochem. 57:627-663, 1988;Schulz et al., Arch. Microbiol. 137:121-123, 1984; Sost et al., FEBSLett. 173:238-241, 1984; Kishore et al., In “Biotechnology for CropProtection” ACS Symposium Series No. 379. eds. Hedlin et al., 37-48,1988). These variants typically have a higher K_(i) for glyphosate thanthe wild-type EPSPS enzyme that confers the glyphosate-tolerantphenotype, but these variants are also characterized by a high K_(m) forPEP that makes the enzyme kinetically less efficient. For example, theapparent K_(m) for PEP and the apparent K_(i) for glyphosate for thenative EPSPS from E. coli are 10 μM and 0.5 μM while for aglyphosate-resistant isolate having a single amino acid substitution ofan alanine for the glycine at position 96 these values are 220 μM and4.0 mM respectively. U.S. Pat. No. 6,040,497 reports that the mutationknown as the TIPS mutation (a substitution of isoleucine for threonineat amino acid position 102 and a substitution of serine for proline atamino acid position 106) comprises two mutations that when introducedinto the polypeptide sequence of Zea mays EPSPS confers glyphosateresistance to the enzyme. Transgenic plants containing this mutantenzyme are tolerant to glyphosate. Identical mutations may be made inglyphosate sensitive EPSPS enzymes from other plant sources to createglyphosate resistant enzymes.

A variety of native and variant EPSPS enzymes have been expressed intransgenic plants in order to confer glyphosate tolerance (Singh, etal., In “Biosynthesis and Molecular Regulation of Amino Acids inPlants”, Amer Soc Plant Phys. Pubs., 1992). Examples of some of theseEPSPSs include those described and/or isolated in accordance with U.S.Pat. No. 4,940,835, U.S. Pat. No. 4,971,908, U.S. Pat. No. 5,145,783,U.S. Pat. No. 5,188,642, U.S. Pat. No. 5,310,667, and U.S. Pat. No.5,312,910. They can also be derived from a structurally distinct classof non-homologous EPSPS genes, such as the class II EPSPS genes isolatedfrom Agrobacterium sp. strain CP4 as described in U.S. Pat. No.5,633,435 and U.S. Pat. No. 5,627,061.

Chloroplast transit peptides (CTPs) are engineered to be fused to the Nterminus of the bacterial EPSPS to direct the glyphosate resistantenzymes into the plant chloroplast. In the native plant EPSPS,chloroplast transit peptide regions are contained in the native codingsequence (e.g., CTP2, Klee et al., Mol. Gen. Genet. 210:47-442, 1987,herein incorporated by reference in its entirety). The native CTP may besubstituted with a heterologous CTP during construction of a transgeneplant expression cassette. Many chloroplast-localized proteins,including EPSPS, are expressed from nuclear genes as precursors and aretargeted to the chloroplast by a chloroplast transit peptide (CTP) thatis removed during the import steps. Examples of other such chloroplastproteins include the small subunit (SSU) of Ribulose-1,5,-bisphosphatecarboxylase, Ferredoxin, Ferredoxin oxidoreductase, the light-harvestingcomplex protein I and protein II, and Thioredoxin F. It has beendemonstrated in vivo and in vitro that non-chloroplast proteins may betargeted to the chloroplast by use of protein fusions with a CTP andthat a CTP sequence is sufficient to target a protein to thechloroplast. Incorporation of a suitable chloroplast transit peptide,such as, the Arabidopsis thaliana EPSPS CTP (Klee et al., Mol. Gen.Genet. 210:437-442 (1987), and the Petunia hybrida EPSPS CTP(della-Cioppa et al., Proc. Natl. Acad. Sci. USA 83:6873-6877 (1986) hasbeen shown to target heterologous EPSPS protein sequences tochloroplasts in transgenic plants. The production of glyphosate tolerantplants by expression of a fusion protein comprising an amino-terminalCTP with a glyphosate resistant EPSPS enzyme is well known by thoseskilled in the art, (U.S. Pat. No. 5,627,061, U.S. Pat. No. 5,633,435,U.S. Pat. No. 5,312,910, EP 0218571, EP 189707, EP 508909, and EP924299). Those skilled in the art will recognize that various chimericconstructs can be made that utilize the functionality of a particularCTP to import glyphosate resistant EPSPS enzymes into the plant cellchloroplast.

Modification and changes may be made in the structure of thepolynucleotides of the invention and still obtain a molecule thatencodes a functional protein or peptide with desirable characteristics.The following is a method based upon substituting the codon(s) of afirst polynucleotide to create an equivalent, or even an improved,second-generation artificial polynucleotide, where this new artificialpolynucleotide is useful in methods of transgene gene stacking andenhanced expression. It is contemplated that the codon substitutions inthe second-generation polynucleotide can in certain instances result inat least one amino acid different from that of the first polynucleotide.The amino acid substitution may provide an improved characteristic tothe protein, e.g., a glyphosate resistant EPSP synthase, or it may be aconserved change that does not substantially affect the characteristicsof the protein. The method provides for an artificial polynucleotidecreated by the backtranslation of a polypeptide sequence into apolynucleotide using a codon usage table, followed by steps to enhancecharacteristics of the artificial polypeptide that make it particularlyuseful in transgenic plants.

In particular embodiments of the invention, modified polypeptidesencoding herbicide resistant proteins are contemplated to be useful forat least one of the following: to confer herbicide tolerance in atransformed or transgenic plant, to improve expression of herbicideresistance genes in plants, for use as selectable markers forintroduction of other traits of interest into a plant, and to preventrecombination with a similar endogenous plant gene or existing transgenefurther allowing gene stacking without gene silencing.

It is known that the genetic code is degenerate. The amino acids andtheir RNA codon(s) are listed below in Table 1.

TABLE 1 Amino acids and the RNA codons that encode them. Amino AcidCodons Full name; 3 letter code; 1 letter code Alanine; Ala; A GCA GCCGCG GCU Cysteine; Cys; C UGC UGU Aspartic acid; Asp; D GAC GAU Glutamicacid; Glu; E GAA GAG Phenylalanine; Phe; F UUC UUU Glycine; Gly; G GGAGGC GGG GGU Histidine; His; H CAC CAU Isoleucine; Ile; I AUA AUC AUULysine; Lys; K AAA AAG Leucine; Leu; L UUA UUG CUA CUC CUG CUUMethionine; Met; M AUG Asparagine; Asn; N AAC AAU Proline; Pro; P CCACCC CCG CCU Glutamine; Gln; Q CAA CAG Arginine; Arg; R AGA AGG CGA CGCCGG CGU Serine; Ser; S AGC AGU UCA UCC UCG UCU Threonine; Thr; T ACA ACCACG ACU Valine; Val; V GUA GUC GUG GUU Tryptophan; Trp; W UGG Tyrosine;Tyr; Y UAC UAU

The codons are described in terms of RNA bases, e.g. adenine, uracil,guanine and cytosine, it is the mRNA that is directly translated intopolypeptides. It is understood that when designing a DNA polynucleotidefor use in a construct, the DNA bases would be substituted, e.g. thymineinstead of uracil.

It is desirable to provide transgenic plants that have multipleagronomically improved phenotypes. Often herbicide tolerance is used asa selectable marker to assist in the production of transgenic plantsthat may possess additional genes of agronomic importance. The stackingof the transgenes by traditional breeding methods or by retransformationof a first transgenic plant with an additional plant expression cassettemay include the introduction of genes or genetic elements that haveidentical or nearly identical polynucleotide sequence. The progenycontaining these stacked genes may be susceptible to loss of geneexpression due to gene silencing. The method of the present inventionprovides a modified polynucleotide molecule that encodes a herbicideresistant protein. The polynucleotide molecules are designed to besufficiently divergent in polynucleotide sequence from otherpolynucleotide molecules that encode the same herbicide resistanceprotein. These molecules can then coexist in the same plant cell withoutthe concern of gene silencing.

The divergent polynucleotide sequence is created by using a codon usagetable built from the known coding sequences of various plant species.For example, codon usage tables for Arabidopsis thaliana, Zea mays, andGlycine max can be used in the method to design the polynucleotides ofthe present invention. Other codon usage tables from other plants canalso be used by those of ordinary skill in the art.

The first step in the method for designing a new artificialpolynucleotide molecule that encodes a herbicide tolerance protein isthe use of a codon usage table to determine the percent codon usage in aplant species for each amino acid of the herbicide tolerance protein,followed by replacing at least one of every eight contiguous codons witha different codon selected from the codon usage table and adjusting thepercent codon usage for each amino acid encoded by the polynucleotide tosubstantially the same percent codon usage found in the codon usagetable. Additional steps can include introducing a translational stopcodon in the second and third open reading frame of the newpolynucleotide sequence; eliminating some translational start codons inthe second and third open reading frames; adjusting the local GC:ATratio to about 2:1 over a range of about 50 nucleotides; disruptingpotential polyadenylation signals or potential intron splice sites;removing at least one restriction enzyme site of six contiguousnucleotides or greater; and comparing the sequence identity of the newartificial polynucleotide to an existing polynucleotide that encodes thesame or similar protein so that the sequence identity between the twopolynucleotides is not more than 85 percent.

A back translation of a protein sequence to a nucleotide sequence maybeperformed using a codon usage table, such as those found on GeneticsComputer Group (GCG) SeqLab or other DNA analysis programs known tothose skilled in the art of DNA analysis or as provided in Tables 2, 3and 4 of the present invention. The codon usage table for Arabidopsisthaliana (Table 2), Zea mays (Table 3) and Glycine max (Table 4) areexamples of tables that can be constructed for plant species, codonusage tables can also be constructed that represent monocot or dicotcodon usage.

TABLE 2 Arabidopsis thaliana codon usage table. Amino Acid Codon Number/1000 Fraction Gly GGG 188335.00 10.18 0.16 Gly GGA 443469.00 23.98 0.37Gly GGT 409478.00 22.14 0.34 Gly GGC 167099.00 9.03 0.14 Glu GAG596506.00 32.25 0.48 Glu GAA 639579.00 34.58 0.52 Asp GAT 683652.0036.96 0.68 Asp GAC 318211.00 17.20 0.32 Val GTG 320636.00 17.34 0.26 ValGTA 185889.00 10.05 0.15 Val GTT 505487.00 27.33 0.41 Val GTC 235004.0012.71 0.19 Ala GCG 162272.00 8.77 0.14 Ala GCA 323871.00 17.51 0.27 AlaGCT 521181.00 28.18 0.44 Ala GCC 189049.00 10.22 0.16 Arg AGG 202204.0010.93 0.20 Arg AGA 348508.00 18.84 0.35 Ser AGT 260896.00 14.11 0.16 SerAGC 206774.00 11.18 0.13 Lys AAG 605882.00 32.76 0.51 Lys AAA 573121.0030.99 0.49 Asn AAT 418805.00 22.64 0.52 Asn AAC 385650.00 20.85 0.48 MetATG 452482.00 24.46 1.00 Ile ATA 235528.00 12.73 0.24 Ile ATT 404070.0021.85 0.41 Ile ATC 341584.00 18.47 0.35 Thr ACG 140880.00 7.62 0.15 ThrACA 291436.00 15.76 0.31 Thr ACT 326366.00 17.65 0.34 Thr ACC 190135.0010.28 0.20 Trp TGG 231618.00 12.52 1.00 End TGA 19037.00 1.03 0.43 CysTGT 196601.00 10.63 0.60 Cys TGC 131390.00 7.10 0.40 End TAG 9034.000.49 0.20 End TAA 16317.00 0.88 0.37 Tyr TAT 276714.00 14.96 0.52 TyrTAC 254890.00 13.78 0.48 Leu TTG 389368.00 21.05 0.22 Leu TTA 237547.0012.84 0.14 Phe TTT 410976.00 22.22 0.52 Phe TTC 380505.00 20.57 0.48 SerTCG 167804.00 9.07 0.10 Ser TCA 334881.00 18.11 0.20 Ser TCT 461774.0024.97 0.28 Ser TCC 203174.00 10.99 0.12 Arg CGG 88712.00 4.80 0.09 ArgCGA 115857.00 6.26 0.12 Arg CGT 165276.00 8.94 0.17 Arg CGC 69006.003.73 0.07 Gln CAG 280077.00 15.14 0.44 Gln CAA 359922.00 19.46 0.56 HisCAT 256758.00 13.88 0.62 His CAC 160485.00 8.68 0.38 Leu CTG 183128.009.90 0.11 Leu CTA 184587.00 9.98 0.11 Leu CTT 447606.00 24.20 0.26 LeuCTC 294275.00 15.91 0.17 Pro CCG 155222.00 8.39 0.17 Pro CCA 298880.0016.16 0.33 Pro CCT 342406.00 18.51 0.38 Pro CCC 97639.00 5.28 0.11

TABLE 3 Zea mays codon usage table Amino Acid Codon Number /1000Fraction Gly GGG 8069.00 15.19 0.21 Gly GGA 7100.00 13.37 0.18 Gly GGT7871.00 14.82 0.20 Gly GGC 15904.00 29.94 0.41 Glu GAG 22129.00 41.670.68 Glu GAA 10298.00 19.39 0.32 Asp GAT 11996.00 22.59 0.41 Asp GAC17045.00 32.09 0.59 Val GTG 13873.00 26.12 0.38 Val GTA 3230.00 6.080.09 Val GTT 8261.00 15.55 0.23 Val GTC 11330.00 21.33 0.31 Ala GCG11778.00 22.18 0.24 Ala GCA 8640.00 16.27 0.18 Ala GCT 11940.00 22.480.24 Ala GCC 16768.00 31.57 0.34 Arg AGG 7937.00 14.94 0.27 Arg AGA4356.00 8.20 0.15 Ser AGT 3877.00 7.30 0.10 Ser AGC 8653.00 16.29 0.23Lys AAG 22367.00 42.11 0.74 Lys AAA 7708.00 14.51 0.26 Asn AAT 6997.0013.17 0.36 Asn AAC 12236.00 23.04 0.64 Met ATG 12841.00 24.18 1.00 IleATA 3997.00 7.53 0.16 Ile ATT 7457.00 14.04 0.31 Ile ATC 12925.00 24.340.53 Thr ACG 5665.00 10.67 0.22 Thr ACA 5408.00 10.18 0.21 Thr ACT5774.00 10.87 0.22 Thr ACC 9256.00 17.43 0.35 Trp TGG 6695.00 12.61 1.00End TGA 591.00 1.11 0.45 Cys TGT 2762.00 5.20 0.30 Cys TGC 6378.00 12.010.70 End TAG 411.00 0.77 0.32 End TAA 299.00 0.56 0.23 Tyr TAT 4822.009.08 0.31 Tyr TAC 10546.00 19.86 0.69 Leu TTG 6677.00 12.57 0.14 Leu TTA2784.00 5.24 0.06 Phe TTT 6316.00 11.89 0.32 Phe TTC 13362.00 25.16 0.68Ser TCG 5556.00 10.46 0.14 Ser TCA 5569.00 10.49 0.15 Ser TCT 6149.0011.58 0.16 Ser TCC 8589.00 16.17 0.22 Arg CGG 4746.00 8.94 0.16 Arg CGA2195.00 4.13 0.07 Arg CGT 3113.00 5.86 0.10 Arg CGC 7374.00 13.88 0.25Gln CAG 13284.00 25.01 0.64 Gln CAA 7632.00 14.37 0.36 His CAT 5003.009.42 0.39 His CAC 7669.00 14.44 0.61 Leu CTG 13327.00 25.09 0.28 Leu CTA3785.00 7.13 0.08 Leu CTT 8238.00 15.51 0.17 Leu CTC 12942.00 24.37 0.27Pro CCG 8274.00 15.58 0.27 Pro CCA 7845.00 14.77 0.26 Pro CCT 7129.0013.42 0.23 Pro CCC 7364.00 13.87 0.24

TABLE 4 Glycine max codon usage table Amino Acid Codon Number /1000Fraction Gly GGG 3097.00 12.82 0.18 Gly GGA 5434.00 22.49 0.32 Gly GGT5248.00 21.72 0.31 Gly GGC 3339.00 13.82 0.20 Glu GAG 8296.00 34.33 0.50Glu GAA 8194.00 33.91 0.50 Asp GAT 7955.00 32.92 0.62 Asp GAC 4931.0020.40 0.38 Val GTG 5342.00 22.11 0.32 Val GTA 1768.00 7.32 0.11 Val GTT6455.00 26.71 0.39 Val GTC 2971.00 12.29 0.18 Ala GCG 1470.00 6.08 0.08Ala GCA 5421.00 22.43 0.31 Ala GCT 6796.00 28.12 0.38 Ala GCC 4042.0016.73 0.23 Arg AGG 3218.00 13.32 0.28 Arg AGA 3459.00 14.31 0.30 Ser AGT2935.00 12.15 0.17 Ser AGC 2640.00 10.92 0.15 Lys AAG 9052.00 37.46 0.59Lys AAA 6370.00 26.36 0.41 Asn AAT 5132.00 21.24 0.48 Asn AAC 5524.0022.86 0.52 Met ATG 5404.00 22.36 1.00 Ile ATA 3086.00 12.77 0.23 Ile ATT6275.00 25.97 0.47 Ile ATC 3981.00 16.47 0.30 Thr ACG 1006.00 4.16 0.08Thr ACA 3601.00 14.90 0.29 Thr ACT 4231.00 17.51 0.34 Thr ACC 3562.0014.74 0.29 Trp TGG 2866.00 11.86 1.00 End TGA 221.00 0.91 0.36 Cys TGT1748.00 7.23 0.49 Cys TGC 1821.00 7.54 0.51 End TAG 143.00 0.59 0.23 EndTAA 256.00 1.06 0.41 Tyr TAT 3808.00 15.76 0.51 Tyr TAC 3667.00 15.170.49 Leu TTG 5343.00 22.11 0.24 Leu TTA 2030.00 8.40 0.09 Phe TTT4964.00 20.54 0.49 Phe TTC 5067.00 20.97 0.51 Ser TCG 1107.00 4.58 0.06Ser TCA 3590.00 14.86 0.21 Ser TCT 4238.00 17.54 0.24 Ser TCC 2949.0012.20 0.17 Arg CGG 683.00 2.83 0.06 Arg CGA 964.00 3.99 0.08 Arg CGT1697.00 7.02 0.15 Arg CGC 1538.00 6.36 0.13 Gln CAG 4147.00 17.16 0.46Gln CAA 4964.00 20.54 0.54 His CAT 3254.00 13.47 0.55 His CAC 2630.0010.88 0.45 Leu CTG 2900.00 12.00 0.13 Leu CTA 1962.00 8.12 0.09 Leu CTT5676.00 23.49 0.26 Leu CTC 4053.00 16.77 0.18 Pro CCG 1022.00 4.23 0.08Pro CCA 4875.00 20.17 0.37 Pro CCT 4794.00 19.84 0.36 Pro CCC 2445.0010.12 0.19

Codon usage tables are well known in the art and can be found in genedatabases e.g., Genbank database. The Codon Usage Database is anextended WWW version of CUTG (Codon Usage Tabulated from Genbank). Thefrequency of codon usage in each organism is made searchable throughthis World Wide Web site (Nakamura et al. Nucleic Acids Res. 28:292,2000).

In various embodiments of the invention, the steps may be performed inany order or simultaneously. Any or all of the steps may be performed inthe design of an artificial polynucleotide of the invention. Each stepis described in detail below.

Different codons for a particular amino acid should be distributedthroughout the polynucleotide based on approximate percentage codonusage for particular species from a codon usage table. Local cluster ofidentical codons should be avoided. At least one codon is substitutedfor every eight contiguous codons to provide sufficient divergence ofpolynucleotide sequences that encode identical or similar proteins.Except where specifically desired, e.g. to provide a herbicide tolerantenzyme, the encoded protein remains unchanged by substituting one codonfor another codon that is translated to the same amino acid as listed inTable 1.

In embodiments of the present invention, corrections are made to thelocal GC:AT ratio of a polynucleotide by adjusting local GC:AT ratio tobe about the same ratio as the full length polynucleotide, but nothigher than 2× over a range of about 50 contiguous nucleotides of thepolynucleotide molecule. The range of GC:AT ratios of a polynucleotideusing codon usage tables from dicot plants should be from about 0.9 toabout 1.3, and for monocot plants from about 1.2 to about 1.7. The localGC:AT ratio may be important in maintenance of appropriate secondarystructure of RNA. Regions comprising many consecutive A+T bases or G+Cbases are predicted to have a higher likelihood to form hairpinstructures due to self-complementarity. Therefore, replacement with adifferent codon would reduce the likelihood of self-complementarysecondary structure formation, which is known to reduce transcriptionand/or translation in some organisms. In most cases, the adverse effectsmay be minimized by using polynucleotide molecules that do not containmore than five consecutive A+T or G+C. The maximum length of local GCtrack (without any AT nucleotide) should be no longer than 10nucleotides. Therefore codons encoding Gly, Ala, Arg, Ser, and Pro richproteins can be substituted to prevent long clusters of GC nucleotides.The listed GC rich codons may be used in combination with the AT richcodons for amino acids Lys, Asn, Ile, Tyr, Leu, Phe and vice versa tocorrect local GC:AT ratio.

A sequence identity check using nucleotide sequence alignment tools suchas GAP program (GCG, Madison, Wis.) can be done immediately after backtranslation to insure that the generated sequence has appropriate degreeof sequence diversity. Contiguous polynucleotide sequence longer than 23nucleotides having one hundred percent sequence identity should beeliminated by making codon substitutions in these lengths of sequence.

The translational start codons (ATG from the DNA, AUG in the mRNA)present in the second reading frame (frame “b”), the third reading frame(frame “c”), and the reverse reading frames (frame “d”, “e”, “f”). Thesecond and third frame start codons may initiate translation, howevermuch less efficiently than the first. Therefore, if one or two AUG arefound near the 5′ end of an mRNA molecule reside in frame “b” or “c” itwould be beneficial to eliminate them in a polynucleotide region thatcontains at least the first three Met codons in frame “a”. Also, ifprotein sequence does not have more than one Met in frame “a”, theneliminate as many as possible from the “b” or “c” forward frames. Toperform this, for example, the codons for amino acids, Asp, Asn, Tyr,His in the protein of interest followed by any of the amino acids: Gly,Glu, Asp, Val, or Ala, can be substituted to eliminate a start codon inthe second frame. The sequence GATGGG encodes the amino acids Asp-Glyand forms an ATG in the reading frame “b”. When the sequence is modifiedto GACGGG, the ATG start is eliminated and the sequence still encodesAsp-Gly. A similar strategy is used to eliminate start codons in thereading frame “c”. The combination of an amino acid selected from thegroup of Gly, Glu, Val, Ala, Arg, Lys, Ile, Thr, Cys, Tyr, Leu, Ser, Hisor Pro followed by Trp can result in formation on ATG in third readingframe of the gene. In this situation, the first codon can be changed tohave a nucleotide other than A in third position.

The elimination of ATG codon in the complementary DNA strand of the genein alternate frames (“d”, “e”, and/or “f”) without changing amino acidsequence of the protein can be accomplished in a similar manner. Thismodification reduces the probability of translation even if thetransgene is integrated into a plant genome in an orientation thatallows transcription of the reverse complement mRNA from a native plantpromoter. Translation from any reverse reading frame can be minimized byintroduction of a stop codon in all three reverse reading frames asdescribed below.

The creation of stop codons to all three frames of the complementary DNAcan be accomplished as follows. The Leu (TTA and CTA) and Ser codon(TCA) produce three different stop codons in reverse complement strand.If those amino acids can be found at the C terminus of the protein ofinterest, their codons may used to generate stops in the complementarystrand in the reading frame “d”. To generate a stop codon in the readingframe “e” of complementary strand, find amino acids Ala, Arg, Asn, Asp,Cys, Gly, His, Ile, Leu Phe, Pro, Ser, Thr, Tyr, or Val followed byamino acids Gln, His or Tyr the protein of interest. For example,polynucleotide sequence of GCCCAC that encode for amino acids Ala-Hiscan be modified to GCTCAC. The complementary sequence, GTGAGC, will havenow TGA stop codon shown in italics. When the protein of interest has aAla, Ile, Leu, Phe, Pro, Ser, Thr or Val followed by an Arg, Asn, Ile,Lys, Met, or Ser the reading frame in the complementary strand can bemodified to have a stop codon in the reading frame “f” of thecomplementary strand. The polynucleotide sequence ATATCT for Ile and Sercan be modified to ATCAGT to generate stop codon in complementary strandas shown in italics, ACTGAT. The combination of codons for Phe followedby any of the codons for amino acids Asn, Ile, Lys, Met or Thr willalways generate stop codon in complementary strand frame “e” or “f”.

To create a stop codon in the forward reading frame “b”, the readingframe a must end on nucleotides TA or TG. Search the protein of interestfor the amino acids Ile, Leu, Met or Val in combination with any of thefollowing amino acids: Ala, Arg, Asn Asp, Glu, Gly, Ile, Lys, Met, Ser,Thr or Val. For example, if the polynucleotide sequence encoding theamino acids Met-Ser is ATGTCT, it can be modified to ATGAGT to produce aTGA stop codon in second reading frame.

To be able to create a stop codon to the reading frame “c”, the readingframe “a” must have the nucleotide T in third position and next codonmust start from AA, AG or GA. To find suitable codons to modify, searchthe protein of interest for any of the amino acids: Ala, Asn, Asp, Arg,Cys, Gly, His, Ile, Leu Phe, Pro, Ser, Thr, Tyr or Val follow by any ofthe following amino acids: Arg, Asn, Asp, Glu, Lys or Ser. For example,if the nucleotide sequence for amino acids Gly-Glu is GGAGAG, thesequence can be modified to GGTGAG to create a TGA stop codon in thethird reading frame.

Another useful modification in artificial polynucleotide design methodsof the present invention is to eliminate unwanted restriction sites andother specific sequence patterns. Restriction sites may interfere withfuture gene cloning and manipulations. For example, some restrictionsites commonly used in gene cloning include, but are not limited, to theType II restriction enzymes with 6 or more non-N bases listed in Table 5below which is an excerpt from the New England Biolabs, Inc. (Beverly,Mass., USA) restriction endonuclease database. The search forrestriction enzyme recognition sites can be done using Map functionapplication found in GCG SeqLab or a similar application contained inother DNA analysis programs known to those skilled in the art of DNAanalysis. The restriction enzymes can be also added to the sequence tofacilitate cloning. For example, The ClaI restriction site is placed inCP4EPSPS version AT (SEQ ID NO:17) and ZM (SEQ ID NO:18) to generaterecombinant sequences by fragment exchange and to facilitate genesynthesis using nucleotide fragments that can be assemble to the wholegene. The transit peptide CTP2 polynucleotide sequence (SEQ ID NO:12) isconnected with CP4EPSPS by SphI restriction site to facilitatesubstitution of CTP2 with different nucleotide versions of CTP2 (SEQ IDNO:13, SEQ ID NO:14) or polynucleotides encoding different chloroplasttransit peptides. For example, in the rice EPSPS, the NgaMIV restrictionsite is preserved at about nucleotide position 205 in all artificialversions to facilitate chloroplast transit peptide coding regionexchange. Also, for soybean EPSPS the polynucleotide sequence for thechloroplast transit peptide is separated from the mature peptide by therestriction site for SacII endonuclease.

It is understood that modification of endonuclease restriction sites isnot required, but is useful for further manipulation of the DNAmolecules. Table 5 provides a list of restriction endonucleases, thoseof particular interest to the present invention are marked with anasterisk. Other endonuclease restriction sites desirable for eliminationor addition to an artificial polynucleotide of the present inventionwill be apparent to those of ordinary skill in the art and are notlimited to those listed in Table 5.

TABLE 5 Restriction enzymes recognition sequences Enzymes RecognitionSequence AatII G_ACGT{circumflex over ( )}C *AccI GT{circumflex over( )}MK_AC Acc65I G{circumflex over ( )}GTAC_C AceII G_CTAG{circumflexover ( )}C- AclI AA{circumflex over ( )}CG_TT AcyI GR{circumflex over( )}CG_YC AfeI AGC{circumflex over ( )}GCT AflII C{circumflex over( )}TTAA_G *AflIII A{circumflex over ( )}CRYG_T AgeI A{circumflex over( )}CCGG_T AhaIII TTT{circumflex over ( )}AAA ApaI G_GGCC{circumflexover ( )}C ApaLI G{circumflex over ( )}TGCA_C ApoI R{circumflex over( )}AATT_Y AscI GG{circumflex over ( )}CGCG_CC AseI AT{circumflex over( )}TA_AT AsiSI GCG_AT{circumflex over ( )}CGC AsuII TT{circumflex over( )}CG_AA AvaI C{circumflex over ( )}YCGR_G AvaIII ATGCAT AvrIIC{circumflex over ( )}CTAG_G BalI TGG{circumflex over ( )}CCA *BamHIG{circumflex over ( )}GATC_C BanI G{circumflex over ( )}GYRC_C BanIIG_RGCY{circumflex over ( )}C *BbeI G_GCGC{circumflex over ( )}C BbvCICC{circumflex over ( )}TCA_GC *BclI T{circumflex over ( )}GATC_A BetIW{circumflex over ( )}CCGG_W BfrBI ATG{circumflex over ( )}CAT *BglIIA{circumflex over ( )}GATC_T BloHII CTGCA{circumflex over ( )}G BlpIGC{circumflex over ( )}TNA_GC Bme1580I G_KGCM{circumflex over ( )}C BmgIGKGCCC Bpu10I CC{circumflex over ( )}TNA_GC BsaI GGTCTCN{circumflex over( )}NNNN_ BsaAI YAC{circumflex over ( )}GTR BsaHI GR{circumflex over( )}CG_YC BsaWI W{circumflex over ( )}CCGG_W BsbI CAACAC BsePIG{circumflex over ( )}CGCG_C BseSI G_KGCM{circumflex over ( )}C BsiIC{circumflex over ( )}ACGA_G BsiEI CG_RY{circumflex over ( )}CG BsiWIC{circumflex over ( )}GTAC_G BsmI GAATG_CN{circumflex over ( )} Bsp1286IG_DGCH{circumflex over ( )}C Bsp1407I T{circumflex over ( )}GTAC_A BspEIT{circumflex over ( )}CCGG_A BspGI CTGGAC BspHI T{circumflex over( )}CATG_A BspLU11I A{circumflex over ( )}CATG_T BspMII T{circumflexover ( )}CCGG_A BsrBI CCG{circumflex over ( )}CTC BsrDIGCAATG_NN{circumflex over ( )} BsrFI R{circumflex over ( )}CCGG_Y BsrGIT{circumflex over ( )}GTAC_A BssHII G{circumflex over ( )}CGCG_C BssSIC{circumflex over ( )}ACGA_G BstAPI GCAN_NNN{circumflex over ( )}NTGCBstBI TT{circumflex over ( )}CG_AA BstEII G{circumflex over ( )}GTNAC_CBstXI CCAN_NNNN{circumflex over ( )}NTGG BstYI R{circumflex over( )}GATC_Y BstZ17I GTA{circumflex over ( )}TAC Bsu36I CC{circumflex over( )}TNA_GG BtgI C{circumflex over ( )}CRYG_G BtrI CAC{circumflex over( )}GTC BtsI GCAGTG_NN{circumflex over ( )} CfrI Y{circumflex over( )}GGCC_R Cfr10I R{circumflex over ( )}CCGG_Y *ClaI AT{circumflex over( )}CG_AT DraI TTT{circumflex over ( )}AAA DraII RG{circumflex over( )}GNC_CY DrdII GAACCA DsaI C{circumflex over ( )}CRYG_G EaeIY{circumflex over ( )}GGCC_R EagI C{circumflex over ( )}GGCC_G Ecl136IIGAG{circumflex over ( )}CTC Eco47III AGC{circumflex over ( )}GCT EcoNICCTNN{circumflex over ( )}N_NNAGG EcoO109I RG{circumflex over ( )}GNC_CY*EcoRI G{circumflex over ( )}AATT_C *EcoRV GAT{circumflex over ( )}ATCEspI GC{circumflex over ( )}TNA_GC *FseI GG_CCGG{circumflex over ( )}CC*FspI TGC{circumflex over ( )}GCA FspAI RTGC{circumflex over ( )}GCAYGdiII C{circumflex over ( )}GGCC_R HaeI WGG{circumflex over ( )}CCWHaeII R_GCGC{circumflex over ( )}Y HgiAI G_WGCW{circumflex over ( )}CHgiCI G{circumflex over ( )}GYRC_C HgiJII G_RGCY{circumflex over ( )}C*HincII GTY{circumflex over ( )}RAC HindII GTY{circumflex over ( )}RAC*HindIII A{circumflex over ( )}AGCT_T *HpaI GTT{circumflex over ( )}AACKasI G{circumflex over ( )}GCGC_C *KpnI G_GTAC{circumflex over ( )}CLpnI RGC{circumflex over ( )}GCY McrI CG_RY{circumflex over ( )}CG MfeIC{circumflex over ( )}AATT_G *MluI A{circumflex over ( )}CGCG_T MscITGG{circumflex over ( )}CCA MspA1I CMG{circumflex over ( )}CKG MstITGC{circumflex over ( )}GCA NaeI GCC{circumflex over ( )}GGC NarIGG{circumflex over ( )}CG_CC *NcoI C{circumflex over ( )}CATG_G *NdeICA{circumflex over ( )}TA_TG *NgoMIV G{circumflex over ( )}CCGG_C *NheIG{circumflex over ( )}CTAG_C Nli3877I C_YCGR{circumflex over ( )}G *NotIGC{circumflex over ( )}GGCC_GC *NruI TCG{circumflex over ( )}CGA *NsiIA_TGCA{circumflex over ( )}T NspI R_CATG{circumflex over ( )}Y NspBIICMG{circumflex over ( )}CKG *PacI TTA_AT{circumflex over ( )}TAA *PciIA{circumflex over ( )}CATG_T Pfl1108I TCGTAG *PflMI CCAN_NNN{circumflexover ( )}NTGG PmaCI CAC{circumflex over ( )}GTG PmeI GTTT{circumflexover ( )}AAAC PmlI CAC{circumflex over ( )}GTG Ppu10I A{circumflex over( )}TGCA_T *PpuMI RG{circumflex over ( )}GWC_CY PshAI GACNN{circumflexover ( )}NNGTC PsiI TTA{circumflex over ( )}TAA *PspOMI G{circumflexover ( )}GGCC_C PssI RG_GNC{circumflex over ( )}CY *PstIC_TGCA{circumflex over ( )}G *PvuI CG_AT{circumflex over ( )}CG *PvuIICAG{circumflex over ( )}CTG RsrII CG{circumflex over ( )}GWC_CG *SacIG_AGCT{circumflex over ( )}C *SacII CC_GC{circumflex over ( )}GG *SalIG{circumflex over ( )}TCGA_C SanDI GG{circumflex over ( )}GWC_CC SapIGCTCTTCN{circumflex over ( )}NNN SauI CC{circumflex over ( )}TNA_GG SbfICC_TGCA{circumflex over ( )}GG *ScaI AGT{circumflex over ( )}ACT SciICTC{circumflex over ( )}GAG SduI G_DGCH{circumflex over ( )}C SexAIA{circumflex over ( )}CCWGG_T SfcI C{circumflex over ( )}TRYA_G SfeIC{circumflex over ( )}TRYA_G SfiI GGCCN_NNN{circumflex over ( )}NGGCCSfoI GGC{circumflex over ( )}GCC SgfI GCG_AT{circumflex over ( )}CGCSgrAI CR{circumflex over ( )}CCGG_YG *SmaI CCC{circumflex over ( )}GGGSmlI C{circumflex over ( )}TYRA_G SnaI GTATAC *SnaBI TAC{circumflex over( )}GTA *SpeI A{circumflex over ( )}CTAG_T *SphI G_CATG{circumflex over( )}C SplI C{circumflex over ( )}GTAC_G SrfI GCCC{circumflex over( )}GGGC Sse232I CG{circumflex over ( )}CCGG_CG Sse8387ICC_TGCA{circumflex over ( )}GG Sse8647I AG{circumflex over ( )}GWC_CT*SspI AAT{circumflex over ( )}ATT *StuI AGG{circumflex over ( )}CCT*StyI C{circumflex over ( )}CWWG_G *SwaI ATTT{circumflex over ( )}AAATTatI W{circumflex over ( )}GTAC_W UbaMI TCCNGGA UbaPI CGAACG *VspIAT{circumflex over ( )}TA_AT *XbaI T{circumflex over ( )}CTAG_A *XhoIC{circumflex over ( )}TCGA_G XhoII R{circumflex over ( )}GATC_Y *XmaIC{circumflex over ( )}CCGG_G XmaIII C{circumflex over ( )}GGCC_G XmnIGAANN{circumflex over ( )}NNTTC ZraI GAC{circumflex over ( )}GTC

A pattern search may be performed to find potential destabilizingsequences and polyadenylation sites and then disrupt or eliminate themas described in U.S. Pat. No. 5,500,365. Certain long stretches of ATrich regions, e.g. the sequence motif ATTTA (or AUUUA, as it appears inRNA) have been implicated as a destabilizing sequence in mammalian cellmRNA (Shaw and Kamen, Cell 46:659-667, 1986). Many short lived mRNAshave A+T rich 3′ untranslated regions, and these regions often have theATTTA sequence, sometimes present in multiple copies or as multimers(e.g., ATTTATTTA . . . ). Shaw and Kamen showed that the transfer of the3′ end of an unstable mRNA to a stable RNA (globin or VA1) decreased thestable RNA's halflife dramatically. They further showed that a pentamerof ATTTA had a profound destabilizing effect on a stable message, andthat this signal could exert its effect whether it is located at the 3′end or within the coding sequence. However, the number of ATTTAsequences and/or the sequence context in which they occur also appear tobe important in determining whether they function as destabilizingsequences. They also showed that a trimer of ATTTA had much less effectthan a pentamer on mRNA stability and a dimer or a monomer had no effecton stability. Note that multimers of ATTTA such as a pentamerautomatically create an A+T rich region. In other unstable mRNAs, theATTTA sequence may be present in only a single copy, but it is oftencontained in an A+T rich region. A repeat of 11 AUUUA pentamers has beenshown to target reporter transcripts for rapid degradation in plants(Ohme-Takagi et al., Proc. Nat. Acad. Sci. USA 90, 11811-11815, 1993).ATTTA sequence can be formed by combination of codons for amino acid Ile(ATT) and Tyr (TAT) as shown ATTTAT. Another example could be codonsthat end on AT as in Asn, Asp, His or Tyr, followed by TTA codon for Leu(e.g. AATTTA). Also codon for Phe (UUU) when placed between codons thatends on A and starts on A will form ATTTA motif. To eliminate this motifusually single nucleotide change is sufficient as in example shown:GCATTTAGC change to GCATTCAGC or GCCTTTAGC. All three polynucleotideshown code for Ala-Phe-Arg.

More cis-acting sequences that target transcript for rapid turnover inplants and in other system has been identified (Abler and Green, PlantMol. Biol. 32:63-78, 1997). Those include the DST element that consistthree highly conserved subdomains separated by two variable regionsfound downstream of the stop codon of SAUR transcripts (Newmaan et al.,Plant Cell 5: 701-714, 1993). The DST conserved sequence consist ofGGAG(N₅)CATAGATTG(N₇)CATTTTGTAT, where highly conserved residues areshown in italics type. The second and third subdomains of DST elementscontain residues that are invariant among DST elements and are termedATAGAT and GTA, respectively. Both of those subdomains are necessary forDST function. New artificial polynucleotide sequences are screened forthe presence of conserved motifs of DST elements GGAG, ATAGATT, CATTTand CATTTTGTAT. Those sequences are eliminated by base substitutions ofcodons preserving protein sequence encoded by the polynucleotide. TheDST sequence motifs GGAG, ATAGAT, CATTT and GTAT that appeared inclusters or patterns similar to the conserved DST sequence are alsoeliminated by base substitutions.

Polynucleotide sequences that may possibly function as polyadenylationsites are eliminated in the new polynucleotide design (U.S. Pat. No.5,500,365). These polyadenylation signals may not act as proper polyAsites, but instead function as aberrant sites that give rise to unstablemRNAs.

The addition of a polyadenylate string to the 3′ end of a mRNA is commonto most eukaryotic mRNAs. Contained within this mRNA transcript aresignals for polyadenylation and proper 3′ end formation. This processingat the 3′ end involves cleavage of the mRNA and addition of polyA to themature 3′ end. By searching for consensus sequences near the polyA tractin both plant and animal mRNAs, it has been possible to identifyconsensus sequences that apparently are involved in polyA addition and3′ end cleavage. The same consensus sequences seem to be important toboth of these processes. These signals are typically a variation on thesequence AATAAA. In animal cells, some variants of this sequence thatare functional have been identified; in plant cells there seems to be anextended range of functional sequences (Dean et al., Nucl Acid Res.,14:2229-2240, 1986; Hunt, Annu Rev. Plant Physiol. Plant Mol. Biol45:47-60, 1994; Rothine, Plant Mol. Biol. 32:43-61, 1996). All of theseconsensus sequences are variations on AATAAA, therefore, they all areA+T rich sequences.

Typically, to obtain sufficient expression of modified transgenes inplants, existing structural polynucleotide coding sequence (“structuralgene”) that encodes for the protein of interest is modified by removalof ATTTA sequences and putative polyadenylation signals by site directedmutagenesis of the DNA comprising the structural gene. Substantially allof the known polyadenylation signals and ATTTA sequences are removed inthe modified polynucleotide, although enhanced expression levels areoften observed with only removal of some of the above identifiedpolyadenylation signal sequences. Alternately, if an artificialpolynucleotide is prepared that encodes for the subject protein, codonsare selected to avoid the ATTTA sequence and putative polyadenylationsignals. For purposes of the present invention putative polyadenylationsignals include, but are not necessarily limited to, AATAAA, AATAAT,AACCAA, ATATAA, AATCAA, ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT,AAAATA, ATTAAA, AATTAA, AATACA and CATAAA.

The selected DNA sequence is scanned to identify regions with greaterthan four consecutive adenine (A) or thymine (T) nucleotides. The A+Tregions are scanned for potential plant polyadenylation signals.Although the absence of five or more consecutive A or T nucleotideseliminates most plant polyadenylation signals, if there are more thanone of the minor polyadenylation signals identified within tennucleotides of each other, then the nucleotide sequence of this regionis altered to remove these signals while maintaining the originalencoded amino acid sequence.

The next step is to consider the about 15 to about 30 or so nucleotideresidues surrounding the A+T rich region. If the A+T content of thesurrounding region is less than 80%, the region should be examined forpolyadenylation signals. Alteration of the region based onpolyadenylation signals is dependent upon (1) the number ofpolyadenylation signals present and (2) presence of a major plantpolyadenylation signal. The polyadenylation signals are removed by basesubstitution of the DNA sequence in the context of codon replacement.

Two additional patterns not identified in U.S. Pat. No. 5,500,365, aresearched for and eliminated in embodiments of the present invention. Thesequences AGGTAA and GCAGGT are consensus sequences for intron 5′ and 3′splice sites, respectively, in monocot plants and dicot plants. Only GTof the 5′ splice site and the AG in the 3′ splice site are required tobe an exact match. However, when conducting a search for these consensussequences, no mismatch is allowed for each base.

After each step sequence mapping is done using MAP program from GCG todetermine location of the open reading frames and identify sequencepatterns that further need to be modified. The final step would be toperform sequence identity analysis using for example the GAP programfrom GCG package to determine degree of sequence divergence and percentidentity.

Polypeptides

Generally, the translated protein of the artificial polynucleotide willhave the same amino acid sequence as the protein translated from theunmodified coding region. However, the substitution of codons thatencode for amino acids that provide a functional homologue of theprotein is an aspect of the invention. For example, certain amino acidsmay be substituted for other amino acids in a protein structure withoutappreciable loss of interactive binding capacity with structures suchas, for example, antigen-binding regions of antibodies or binding siteson substrate molecules. Since it is the interactive capacity and natureof a protein that defines that protein's biological functional activity,certain amino acid sequence substitutions can be made in a proteinsequence, and, of course, its underlying DNA coding sequence, andnevertheless obtain a protein with like properties. It is thuscontemplated by the inventors that various changes may be made in thepeptide sequences of the disclosed compositions by making changes in thecorresponding DNA sequences that encode the peptides in which thepeptides shown no appreciable loss of their biological utility oractivity.

A further aspect of the invention comprises functional homologues, whichdiffer in one or more amino acids from those of a polypeptide providedherein as the result of one or more conservative amino acidsubstitutions. It is well known in the art that one or more amino acidsin a native sequence can be substituted with at least one other aminoacid, the charge and polarity of which are similar to that of the nativeamino acid, resulting in a silent change. For instance, valine is aconservative substitute for alanine and threonine is a conservativesubstitute for serine. Conservative substitutions for an amino acidwithin the native polypeptide sequence can be selected from othermembers of the class to which the naturally occurring amino acidbelongs. Amino acids can be divided into the following four groups: (1)acidic amino acids, (2) basic amino acids, (3) neutral polar aminoacids, and (4) neutral nonpolar amino acids. Representative amino acidswithin these various groups include, but are not limited to: (1) acidic(negatively charged) amino acids such as aspartic acid and glutamicacid; (2) basic (positively charged) amino acids such as arginine,histidine, and lysine; (3) neutral polar amino acids such as glycine,serine, threonine, cysteine, tyrosine, asparagine, and glutamine; and(4) neutral nonpolar (hydrophobic) amino acids such as alanine, leucine,isoleucine, valine, proline, phenylalanine, tryptophan, and methionine.Conserved substitutes for an amino acid within a native amino acidsequence can be selected from other members of the group to which thenaturally occurring amino acid belongs. For example, a group of aminoacids having aliphatic side chains is glycine, alanine, valine, leucine,and isoleucine; a group of amino acids having aliphatic-hydroxyl sidechains is serine and threonine; a group of amino acids havingamide-containing side chains is asparagine and glutamine; a group ofamino acids having aromatic side chains is phenylalanine, tyrosine, andtryptophan; a group of amino acids having basic side chains is lysine,arginine, and histidine; and a group of amino acids havingsulfur-containing side chains is cysteine and methionine. Naturallyconservative amino acids substitution groups are: valine-leucine,valine-isoleucine, phenylalanine-tyrosine, lysine-arginine,alanine-valine, aspartic acid-glutamic acid, and asparagine-glutamine.

DNA Constructs

Exogenous genetic material may be transferred into a plant by the use ofa DNA construct designed for such a purpose by methods that utilizeAgrobacterium, particle bombardment or other methods known to thoseskilled in the art. Design of such a DNA construct is generally withinthe skill of the art (Plant Molecular Biology: A Laboratory Manual, eds.Clark, Springer, New York (1997). Examples of such plants in to whichexogenous genetic material may be transferred, include, withoutlimitation, alfalfa, Arabidopsis, barley, Brassica, broccoli, cabbage,citrus, cotton, garlic, oat, oilseed rape, onion, canola, flax, maize,an ornamental annual and ornamental perennial plant, pea, peanut,pepper, potato, rice, rye, sorghum, soybean, strawberry, sugarcane,sugar beet, tomato, wheat, poplar, pine, fir, eucalyptus, apple,lettuce, lentils, grape, banana, tea, turf grasses, sunflower, oil palm,Phaseolus, trees, shrubs, vines, etc. It is well known thatagronomically important plants comprise genotypes, varieties andcultivars, and that the methods and compositions of the presentinvention can be tested in these plants by those of ordinary skill inthe art of plant molecular biology and plant breeding.

A large number of isolated DNA promoter molecules that are active as agenetic element of a transgene in plant cells have been described. Theseinclude the nopaline synthase (P-nos) promoter (Ebert et al., Proc.Natl. Acad. Sci. (U.S.A.) 84:5745-5749, 1987), the entirety of which isherein incorporated by reference), the octopine synthase (P-ocs)promoter, which are carried on tumor-inducing plasmids of Agrobacteriumtumefaciens, the caulimovirus promoters, such as the cauliflower mosaicvirus (CaMV) 19S promoter (Lawton et al., Plant Mol. Biol. 9:315-324,1987), the entirety of which is herein incorporated by reference) andthe CaMV 35S promoter (Odell et al., Nature 313:810-812, 1985), theentirety of which is herein incorporated by reference), the figwortmosaic virus 35S promoter (U.S. Pat. No. 6,018,100, the entirety ofwhich is herein incorporated by reference), the light-inducible promoterfrom the small subunit of ribulose-1,5-bis-phosphate carboxylase(ssRUBISCO), the Adh promoter (Walker et al., Proc. Natl. Acad. Sci.(U.S.A.) 84:6624-6628, 1987), the entirety of which is hereinincorporated by reference), the sucrose synthase promoter (Yang et al.,Proc. Natl. Acad. Sci. (U.S.A.) 87:4144-4148, 1990), the entirety ofwhich is herein incorporated by reference), the R gene complex promoter(Chandler et al., Plant Cell 1:1175-1183, 1989, the entirety of which isherein incorporated by reference), and the chlorophyll a/b bindingprotein gene promoter, etc.

A variety of promoters specifically active in vegetative tissues, suchas leaves, stems, roots and tubers, can be used to express the nucleicacid molecules of the present invention. Examples of tuber-specificpromoters include, but are not limited to the class I and II patatinpromoters (Bevan et al., EMBO J. 8: 1899-1906, 1986); Koster-Topfer etal., Mol Gen Genet. 219: 390-396, 1989); Mignery et al., Gene 62:27-44,1988); Jefferson et al., Plant Mol. Biol. 14: 995-1006, 1990), hereinincorporated by reference in their entireties), the promoter for thepotato tuber ADPGPP genes, both the large and small subunits; thesucrose synthase promoter (Salanoubat and Belliard, Gene 60:47-56,1987), Salanoubat and Belliard, Gene 84:181-185, 1989), hereinincorporated by reference in their entirety); and the promoter for themajor tuber proteins including the 22 kd protein complexes andproteinase inhibitors (Hannapel, Plant Physiol. 101: 703-704, 1993),herein incorporated by reference in its entirety). Examples ofleaf-specific promoters include but are not limited to the ribulosebiphosphate carboxylase (RbcS or RuBISCO) promoters (see, e.g., Matsuokaet al., Plant J. 6:311-319, 1994), herein incorporated by reference inits entirety); the light harvesting chlorophyll a/b binding protein genepromoter (see, e.g., Shiina et al., Plant Physiol. 115:477-483, 1997;Casal et al., Plant Physiol. 116:1533-1538, 1998, herein incorporated byreference in their entireties); and the Arabidopsis thaliana myb-relatedgene promoter (Atmyb5) (Li et al., FEBS Lett. 379:117-121, 1996, hereinincorporated by reference in its entirety). Examples of root-specificpromoter include but are not limited to the promoter for the acidchitinase gene (Samac et al., Plant Mol. Biol. 25:587-596, 1994), hereinincorporated by reference in its entirety); the root specific subdomainsof the CaMV35S promoter that have been identified (Lam et al., Proc.Natl. Acad. Sci. (U.S.A.) 86:7890-7894, 1989, herein incorporated byreference in its entirety); the ORF13 promoter from Agrobacteriumrhizogenes which exhibits high activity in roots (Hansen et al., Mol.Gen. Genet. 254:337-343, 1997), herein incorporated by reference in itsentirety); the promoter for the tobacco root-specific gene RB7 (U.S.Pat. No. 5,750,386; Yamamoto et al., Plant Cell 3:371-382, 1991, hereinincorporated by reference in its entirety); and the root cell specificpromoters reported by Conkling et al. (Conkling et al., Plant Physiol.93:1203-1211, 1990, herein incorporated by reference in its entirety),and the POX1 (Pox1, pox1) promoter (Hertig, et al. Plant Mol. Biol.16:171, 1991).

Another class of useful vegetative tissue-specific promoters aremeristematic (root tip and shoot apex) promoters. For example, the“SHOOTMERISTEMLESS” and “SCARECROW” promoters, which are active in thedeveloping shoot or root apical meristems (Di Laurenzio et al., Cell86:423-433, 1996; Long, Nature 379:66-69, 1996); herein incorporated byreference in their entireties), can be used. Another example of a usefulpromoter is that which controls the expression of3-hydroxy-3-methylglutaryl coenzyme A reductase HMG2 gene, whoseexpression is restricted to meristematic and floral (secretory zone ofthe stigma, mature pollen grains, gynoecium vascular tissue, andfertilized ovules) tissues (see, e.g., Enjuto et al., Plant Cell.7:517-527, 1995, herein incorporated by reference in its entirety). Alsoanother example of a useful promoter is that which controls theexpression of kn1-related genes from maize and other species which showmeristem-specific expression (see, e.g., Granger et al., Plant Mol.Biol. 31:373-378, 1996; Kerstetter et al., Plant Cell 6:1877-1887, 1994;Hake et al., Philos. Trans. R. Soc. Lond. B. Biol. Sci. 350:45-51, 1995,herein incorporated by reference in their entireties). Another exampleof a meristematic promoter is the Arabidopsis thaliana KNAT1 promoter.In the shoot apex, KNAT1 transcript is localized primarily to the shootapical meristem; the expression of KNATI in the shoot meristem decreasesduring the floral transition and is restricted to the cortex of theinflorescence stem (see, e.g., Lincoln et al., Plant Cell 6:1859-1876,1994, herein incorporated by reference in its entirety).

Suitable seed-specific and seed enhanced promoters can be derived fromthe following genes: MAC1 from maize (Sheridan et al., Genetics142:1009-1020, 1996, herein incorporated by reference in its entirety);Cat3 from maize (Genbank No. L05934, Abler et al., Plant Mol. Biol.22:10131-1038, 1993, herein incorporated by reference in its entirety);vivparous-1 from Arabidopsis (Genbank No. U93215); Atimyc1 fromArabidopsis (Urao et al., Plant Mol. Biol. 32:571-57, 1996; Conceicao etal., Plant 5:493-505, 1994, herein incorporated by reference in theirentireties); napA from Brassica napus (Genbank No. J02798); the napingene family from Brassica napus (Sjodahl et al., Planta 197:264-271,1995, herein incorporated by reference in its entirety).

The ovule-specific promoter for BEL1 gene (Reiser et al. Cell83:735-742, 1995, Genbank No. U39944; Ray et al, Proc. Natl. Acad. Sci.USA 91:5761-5765, 1994, all of which are herein incorporated byreference in their entireties) can also be used. The egg and centralcell specific MEA (FIS1) and FIS2 promoters are also useful reproductivetissue-specific promoters (Luo et al., Proc. Natl. Acad. Sci. USA,97:10637-10642, 2000; Vielle-Calzada, et al., Genes Dev. 13:2971-2982,1999; herein incorporated by reference in their entireties).

A maize pollen-specific promoter has been identified in maize (Guerreroet al., Mol. Gen. Genet. 224:161-168, 1990, herein incorporated byreference in its entirety). Other genes specifically expressed in pollenhave been described (see, e.g., Wakeley et al., Plant Mol. Biol.37:187-192, 1998; Ficker et al., Mol. Gen. Genet. 257:132-142, 1998;Kulikauskas et al., Plant Mol. Biol. 34:809-814, 1997; Treacy et al.,Plant Mol. Biol. 34:603-611, 1997; all of which are herein incorporatedby reference in their entireties).

Promoters derived from genes encoding embryonic storage proteins, whichincludes the gene encoding the 2S storage protein from Brassica napus(Dasgupta et al, Gene 133:301-302, 1993, herein incorporated byreference in its entirety); the 2s seed storage protein gene family fromArabidopsis; the gene encoding oleosin 20 kD from Brassica napus(GenBank No. M63985); the genes encoding oleosin A (Genbank No. U09118)and oleosin B (GenBank No. U09119) from soybean; the gene encodingoleosin from Arabidopsis (GenBank No. Z17657); the gene encoding oleosin18 kD from maize (GenBank No. J05212, Lee, Plant Mol. Biol.26:1981-1987, 1994), herein incorporated by reference in its entirety);and the gene encoding low molecular weight sulphur rich protein fromsoybean (Choi et al., Mol. Gen. Genet. 246:266-268, 1995, hereinincorporated by reference in its entirety), can also be used.

Promoters derived from zein encoding genes (including the 15 kD, 16 kD,19 kD, 22 kD, 27 kD), and gamma genes; Pedersen et al., Cell29:1015-1026, 1982, herein incorporated by reference in its entirety)can be also used. The zeins are a group of storage proteins found inmaize endosperm.

Other promoters known to function, for example, in maize, include thepromoters for the following genes: waxy, Brittle, Shrunken 2, Branchingenzymes I and II, starch synthases, debranching enzymes, oleosins,glutelins, and sucrose synthases. A particularly preferred promoter formaize endosperm expression is the promoter for the glutelin gene fromrice, more particularly the Osgt-1 promoter (Zheng et al., Mol. CellBiol. 13:5829-5842, 1993, herein incorporated by reference in itsentirety). Examples of promoters suitable for expression in wheatinclude those promoters for the ADPglucose pyrophosphorylase (ADPGPP)subunits, the granule bound and other starch synthases, the branchingand debranching enzymes, the embryogenesis-abundant proteins, thegliadins, and the glutenins. Examples of such promoters in rice includethose promoters for the ADPGPP subunits, the granule bound and otherstarch synthases, the branching enzymes, the debranching enzymes,sucrose synthases, and the glutelins. A particularly preferred promoteris the promoter for rice glutelin, Osgt-1. Examples of such promotersfor barley include those for the ADPGPP subunits, the granule bound andother starch synthases, the branching enzymes, the debranching enzymes,sucrose synthases, the hordeins, the embryo globulins, and the aleuronespecific proteins.

A tomato promoter active during fruit ripening, senescence andabscission of leaves and, to a lesser extent, of flowers can be used(Blume et al., Plant J. 12:731-746, 1997, herein incorporated byreference in its entirety). Other exemplary promoters include the pistolspecific promoter in the potato (Solanum tuberosum L.) SK2 gene,encoding a pistil-specific basic endochitinase (Ficker et al., PlantMol. Biol. 35:425-431, 1997, herein incorporated by reference in itsentirety); the Blec4 gene from pea (Pisum sativum cv. Alaska), active inepidermal tissue of vegetative and floral shoot apices of transgenicalfalfa. This makes it a useful tool to target the expression of foreigngenes to the epidermal layer of actively growing shoots. The tissuespecific E8 promoter from tomato is also useful for directing geneexpression in fruits (Deikman, et al., Plant Physiology 100:2013-2017,1992).

It is further recognized that since in most cases the exact boundariesof regulatory sequences have not been completely defined, DNA fragmentsof different lengths may have identical promoter activity.

Promoters that are known or are found to cause transcription of DNA inplant cells can be used in the present invention. Such promoters may beobtained from a variety of sources such as plants and plant viruses. Inaddition to promoters that are known to cause transcription of DNA inplant cells, other promoter molecules may be identified for use in thecurrent invention by screening a plant cDNA library for genes that areselectively or preferably expressed in the target tissues or cells andisolating the 5′ genomic region of the identified cDNAs.

Constructs or vectors may also include with the coding region ofinterest a polynucleic acid that acts, in whole or in part, to terminatetranscription of that region. For example, such sequences have beenisolated including the Tr7 3′ sequence and the nos 3′ sequence(Ingelbrecht et al., The Plant Cell 1:671-680, 1989, the entirety ofwhich is herein incorporated by reference; Bevan et al., Nucleic AcidsRes. 11:369-385, 1983, the entirety of which is herein incorporated byreference).

A vector or construct may also include regulatory elements. Examples ofsuch include the Adh intron 1 (Callis et al., Genes and Develop.1:1183-1200, 1987, the entirety of which is herein incorporated byreference), the sucrose synthase intron (Vasil et al., Plant Physiol.91:1575-1579, 1989, the entirety of which is herein incorporated byreference) and the TMV omega element (Gallie et al., Plant Cell1:301-311, 1989, the entirety of which is herein incorporated byreference). These and other regulatory elements may be included whenappropriate.

A vector or construct may also include a selectable marker. Selectablemarkers may also be used to select for plants or plant cells thatcontain the exogenous genetic material. Examples of such include, butare not limited to, a neo gene (Potrykus et al., Mol. Gen. Genet.199:183-188, 1985, the entirety of which is herein incorporated byreference) which codes for kanamycin resistance and can be selected forusing kanamycin, G418, etc.; a bar gene that provides for bialaphosresistance; a mutant EPSP synthase gene (Hinchee et al., Bio/Technology6:915-922 (1988), the entirety of which is herein incorporated byreference) that provide for glyphosate resistance; a nitrilase gene thatprovides for resistance to bromoxynil (Stalker et al., J. Biol. Chem.263:6310-6314 (1988), the entirety of which is herein incorporated byreference); a mutant acetolactate synthase gene (ALS) which confersimidazolinone or sulphonylurea; and a methotrexate resistant DHFR gene(Thillet et al., J. Biol. Chem. 263:12500-12508, 1988, the entirety ofwhich is herein incorporated by reference).

A vector or construct may also include a screenable marker. Screenablemarkers may be used to monitor expression. Exemplary screenable markersinclude a β-glucuronidase or uidA gene (GUS) which encodes an enzyme forwhich various chromogenic substrates are known (Jefferson, Plant Mol.Biol, Rep. 5:387-405 (1987), the entirety of which is hereinincorporated by reference; Jefferson et al., EMBO J. 6:3901-3907 (1987),the entirety of which is herein incorporated by reference); an R-locusgene, which encodes a product that regulates the production ofanthocyanin pigments (red color) in plant tissues (Dellaporta et al.,Stadler Symposium 11:263-282 (1988), the entirety of which is hereinincorporated by reference); a β-lactamase gene (Sutcliffe et al., Proc.Natl. Acad. Sci. (U.S.A.) 75:3737-3741 (1978), the entirety of which isherein incorporated by reference), a gene which encodes an enzyme forwhich various chromogenic substrates are known (e.g., PADAC, achromogenic cephalosporin); a luciferase gene (Ow et al., Science234:856-859 (1986), the entirety of which is herein incorporated byreference); a xylE gene (Zukowsky et al., Proc. Natl. Acad. Sci.(U.S.A.) 80:1101-1105 (1983), the entirety of which is hereinincorporated by reference) which encodes a catechol dioxygenase that canconvert chromogenic catechols; an α-amylase gene (Ikatu et al.,Bio/Technol. 8:241-242, 1990, the entirety of which is hereinincorporated by reference); a tyrosinase gene (Katz et al., J. Gen.Microbiol. 129:2703-2714, 1983, the entirety of which is hereinincorporated by reference) which encodes an enzyme capable of oxidizingtyrosine to DOPA and dopaquinone which in turn condenses to melanin; andan α-galactosidase.

Introduction of Polynucleotides into Plants

There are many methods for introducing transforming nucleic acidmolecules into plant cells. Suitable methods are believed to includevirtually any method by which nucleic acid molecules may be introducedinto a cell, such as by Agrobacterium infection or direct delivery ofnucleic acid molecules such as, for example, by PEG-mediatedtransformation, by electroporation or by acceleration of DNA coatedparticles, etc. (Potrykus, Ann. Rev. Plant Physiol. Plant Mol. Biol.42:205-225 (1991), the entirety of which is herein incorporated byreference; Vasil, Plant Mol. Biol. 25:925-937 (1994), the entirety ofwhich is herein incorporated by reference). For example, electroporationhas been used to transform Zea mays protoplasts (Fromm et al., Nature312:791-793, 1986, the entirety of which is herein incorporated byreference).

Other vector systems suitable for introducing transforming DNA into ahost plant cell include but are not limited to binary artificialchromosome (BIBAC) vectors (Hamilton et al., Gene 200:107-116, 1997, theentirety of which is herein incorporated by reference), and transfectionwith RNA viral vectors (Della-Cioppa et al., Ann. N.Y. Acad. Sci.(1996), 792 pp Engineering Plants for Commercial Products andApplications, pp 57-61, the entirety of which is herein incorporated byreference.

Technology for introduction of DNA into cells is well known to those ofskill in the art. Four general methods for delivering a gene into cellshave been described: (1) chemical methods (Graham and van der Eb,Virology 54:536-539, 1973, the entirety of which is herein incorporatedby reference); (2) physical methods such as microinjection (Capecchi,Cell 22:479-488 (1980), the entirety of which is herein incorporated byreference), electroporation (Wong and Neumann, Biochem. Biophys. Res.Commun. 107:584-587 (1982); Fromm et al., Proc. Natl. Acad. Sci.(U.S.A.) 82:5824-5828 (1985); U.S. Pat. No. 5,384,253, all of which areherein incorporated in their entirety); and the gene gun (Johnston andTang, Methods Cell Biol. 43:353-365 (1994), the entirety of which isherein incorporated by reference); (3) viral vectors (Clapp, Clin.Perinatol. 20:155-168, 1993; Lu et al., J. Exp. Med. 178:2089-2096,1993; Eglitis and Anderson, Biotechniques 6:608-614, 1988, all of whichare herein incorporated in their entirety); and (4) receptor-mediatedmechanisms (Curiel et al., Hum. Gen. Ther. 3:147-154, 1992, Wagner etal., Proc. Natl. Acad. Sci. USA 89:6099-6103, 1992, all of which areincorporated by reference in their entirety).

Acceleration methods that may be used include, for example,microprojectile bombardment and the like. One example of a method fordelivering transforming nucleic acid molecules to plant cells ismicroprojectile bombardment. This method has been reviewed by Yang andChristou, eds., Particle Bombardment Technology for Gene Transfer,Oxford Press, Oxford, England (1994), the entirety of which is hereinincorporated by reference. Non-biological particles (microprojectiles)that may be coated with nucleic acids and delivered into cells by apropelling force. Exemplary particles include those comprised oftungsten, gold, platinum, and the like.

Agrobacterium-mediated transfer is a widely applicable system forintroducing genes into plant cells because the DNA can be introducedinto whole plant tissues, thereby bypassing the need for regeneration ofan intact plant from a protoplast. The use of Agrobacterium-mediatedplant integrating vectors to introduce DNA into plant cells is wellknown in the art. See, for example the methods described by Fraley etal., Bio/Technology 3:629-635 (1985) and Rogers et al., Methods Enzymol.153:253-277 (1987), both of which are herein incorporated by referencein their entirety. Further, the integration of the Ti-DNA is arelatively precise process resulting in few rearrangements. The regionof DNA to be transferred is defined by the border sequences, andintervening DNA is usually inserted into the plant genome as described(Spielmann et al., Mol. Gen. Genet 205:34 (1986), the entirety of whichis herein incorporated by reference).

A transgenic plant resulting from Agrobacterium transformation methodsfrequently contains a single gene on one chromosome. Such transgenicplants can be referred to as being hemizygous for the added gene. Morepreferred is a transgenic plant that is homozygous for the addedstructural gene; i.e., a transgenic plant that contains two added genes,one gene at the same locus on each chromosome of a chromosome pair. Ahomozygous transgenic plant can be obtained by sexually mating (selfing)an independent segregant transgenic plant that contains a single addedgene, germinating some of the seed produced and analyzing the resultingplants produced for the gene of interest.

It is also to be understood that two different transgenic plants canalso be mated to produce offspring that contain two independentlysegregating added, exogenous genes. Selfing of appropriate progeny canproduce plants that are homozygous for both added, exogenous genes thatencode a polypeptide of interest. Back-crossing to a parental plant andout-crossing with a non-transgenic plant are also contemplated, as isvegetative propagation.

The regeneration, development, and cultivation of plants from singleplant protoplast transformants or from various transformed explants iswell known in the art (Weissbach and Weissbach, In: Methods for PlantMolecular Biology, (eds.), Academic Press, Inc. San Diego, Calif.,(1988), the entirety of which is herein incorporated by reference). Thisregeneration and growth process typically includes the steps ofselection of transformed cells, culturing those individualized cellsthrough the usual stages of embryonic development through the rootedplantlet stage. Transgenic embryos and seeds are similarly regenerated.The resulting transgenic rooted shoots are thereafter planted in anappropriate plant growth medium such as soil.

The development or regeneration of plants containing the foreign,exogenous gene that encodes a protein of interest is well known in theart. Preferably, the regenerated plants are self-pollinated to providehomozygous transgenic plants. Otherwise, pollen obtained from theregenerated plants is crossed to seed-grown plants of agronomicallyimportant lines. Conversely, pollen from plants of these important linesis used to pollinate regenerated plants. A transgenic plant of thepresent invention containing a desired polypeptide is cultivated usingmethods well known to one skilled in the art.

The present invention also provides for parts of the plants of thepresent invention. Plant parts, without limitation, include seed,endosperm, ovule and pollen. In a particularly preferred embodiment ofthe present invention, the plant part is a seed.

Methods for transforming dicots, primarily by use of Agrobacteriumtumefaciens, and obtaining transgenic plants have been published, e.g.,cotton (U.S. Pat. No. 5,004,863, U.S. Pat. No. 5,159,135, U.S. Pat. No.5,518,908, all of which are herein incorporated by reference in theirentirety), soybean (U.S. Pat. No. 5,569,834, the entirety of which isherein incorporated by reference) and Brassica (U.S. Pat. No. 5,463,174,the entirety of which is herein incorporated by reference).

Transformation of monocotyledons using electroporation, particlebombardment, and Agrobacterium have also been reported. For example,transformation and plant regeneration have been achieved in asparagus,barley, Zea mays (Fromm et al., Bio/Technology 8:833 (1990), isArmstrong et al., Crop Science 35:550-557 (1995), all of which areherein incorporated by reference in their entirety); oat; rice, rye,sugarcane; tall fescue and wheat (U.S. Pat. No. 5,631,152, the entiretyof which is herein incorporated by reference.)

In addition to the above discussed procedures, practitioners arefamiliar with the standard resource materials which describe specificconditions and procedures for the construction, manipulation andisolation of macromolecules (e.g., DNA molecules, plasmids, etc.),generation of recombinant organisms and the screening and isolating ofclones, (see for example, Sambrook et al., Molecular Cloning: ALaboratory Manual, Cold Spring Harbor Press (1989); Mailga et al.,Methods in Plant Molecular Biology, Cold Spring Harbor Press (1995), theentirety of which is herein incorporated by reference; Birren et al.,Genome Analysis: Detecting Genes, 1, Cold Spring Harbor, N.Y. (1998),the entirety of which is herein incorporated by reference; Birren etal., Genome Analysis: Analyzing DNA, 2, Cold Spring Harbor, N.Y. (1998),the entirety of which is herein incorporated by reference; PlantMolecular Biology: A Laboratory Manual, eds. Clark, Springer, New York(1997), the entirety of which is herein incorporated by reference).

Having now generally described the invention, the same will be morereadily understood through reference to the following examples, whichare provided by way of illustration, and are not intended to be limitingof the present invention, unless specified.

EXAMPLES Example 1

When an isolated native plant polynucleotide comprising a codingsequence is reconstructed as a transgene, then introduced into the plantby methods of plant transformation there is a risk that expression fromthe endogenous homologous plant gene will interact negatively with thetransgene. To avoid these negative interactions it may be necessary toprovide a transgene polynucleotide substantially divergent in sequencefrom the native plant gene. An artificial polynucleotide molecule can beproduced by the method of the present invention and used to reduce theoccurrence of transgene silencing.

This example serves to illustrate methods of the present invention thatresult in the is production of a polynucleotide encoding a modifiedplant EPSP synthase. The native rice (Oryzae sativa) EPSPS enzyme andchloroplast transit peptide is used to construct an artificialpolynucleotide molecule that also includes codons that encode forsubstituted amino acids that do not naturally occur in the rice EPSPSenzyme. These substituted amino acids provide for a glyphosate resistantrice EPSPS enzyme (OsEPSPS_TIPS, SEQ ID NO:1).

The steps described in Table 6 are used to construct such an artificialpolynucleotide sequence (OsEPSPS_AT, SEQ ID NO:3) using an Arabidopsiscodon usage table and the parameters for construction of a substantiallydivergent polynucleotide molecule, which when expressed in plantsencodes a modified rice EPSPS enzyme resistant to glyphosate herbicide.The comparison of the native rice EPSPS gene sequence referred to asOsEPSPS_Nat (SEQ ID NO:2) that has previously been modified to encode aglyphosate resistant enzyme to the polynucleotide molecule modified forArabidopsis codon usage, OsEPSPS_AT (SEQ ID NO: 3) and to the sequencemodified for Zea mays codon usage, OsEPSPS_ZM (SEQ ID NO: 4) by thismethod is shown in FIG. 1. FIG. 1 shows nucleotide bases changed in themodified polynucleotides compared to OsEPSPS_Nat, SEQ ID NO: 2.

TABLE 6 Polynucleotide design for a modified rice EPSP synthase(OsEPSPS_AT) 1. Substitute amino acids at positions 173 and 177 toprovide a modified rice EPSPS enzyme resistant to glyphosate herbicideshown in SEQ ID NO: 1. 2. Back translate SEQ ID NO: 1 to generate anartificial polynucleotide sequence using the Arabidopsis thaliana codonusage table (Table 2). 3. Perform sequence alignment with native OsEPSPSpolynucleotide sequence (SEQ ID NO: 2) and the artificial polynucleotidesequence to determine degree of sequence identity, map open readingframes, select patterns to search and identify restriction enzymesrecognition sequences. 4. Make corrections to the codons used in theartificial polynucleotide sequence to achieve desired percentage ofsequence identity and to avoid clustering of identical codons. This isespecially important for amino acids that are occur at high frequency,i.e., alanine, glycine, histidine, leucine, serine, and valine.Approximate distribution of codon usage in the polynucleotide sequenceaccording to the Arabidopsis codon usage, Table 2. 5. The polynucleotidesequence is inspected for local regions that have a GC:AT ratio higherthan about 2 over a range of about 50 contiguous nucleotides. Thepolynucleotide sequence is adjusted as necessary, by substituting codonsin these regions such that the local GC:AT ratio is less than about 2and the entire polynucleotide composition is in the range of 0.9-1.3. 6.Introduce stop codons to translation frames “b”, “c”, “d”, “e” and “f”.Translation stop codons are created in the “b”, “c”, “d”, “e” and “f”translational frames by replacing one or more codons within about 130base pairs (bps) of the ends of the artificial polynucleotide thatcreates a stop codon without changing the amino acid coding sequence offrame one. 7. Eliminate ATG codons from forward (frames “b” and “c”) andreverse open reading frames (frame “d”, “e”, “f”). The forward andreverse reading frames are inspected for the presence of ATG codons. AnyATG codons in frame “b” and “c” found in the polynucleotide sequencebefore third Met in frame “a” of the polynucleotide are eliminated byreplacing one or more codons that overlap the ATG changing one of thenucleotides without changing the amino acid coding sequence of frame“a”. In the reverse frames, replacement of ATG or stop codonintroduction may be done to interrupt potential reading frames. 8.Eliminate unwanted restriction enzyme recognition patterns and otherspecific patterns (polyadenylation, RNA splicing, sequence instabilitypatterns). The polynucleotide sequence is inspected for the presence ofany unwanted polynucleotide patterns and the patterns are disrupted bysubstituting codons in these regions. 9. Check sequence identity betweena first polynucleotide and the artificial polynucleotide created by themethod of the present invention. Eliminate sequence identity in acontiguous polynucleotide that is longer than 23 bps. It is desirable toeliminate sequence identity greater than about 15 bps. It is helpful toselect from amino acids such as, serine, arginine, and leucine that have6 codons or from amino acids with 4 codons to eliminate sequenceidentity. 10. Review the artificial polynucleotide sequence resultingfrom anyone of steps 1 to 9 for any of the sequence features identifiedin steps 4-9, and if the sequence does not comply with conditions makeadditional codon substitutions to the sequence until the conditions ofsteps 4-9 are met. 11. Construct the artificial polynucleotide moleculeby methods known in the art, e.g., using PCR with a mixture ofoverlapping primers. The primers at the ends of the gene may containconvenient restriction sites to allow easy cloning of the gene toselected vector. At the 5′ end usually AlfIII, BspHI, NcoI, NdeI, PciI,or SphI are most convenient in as much as their sequence contains an ATGstart codon, however other enzymes can be used as well if a modifiedpolynucleotide is designed to create a fusion with anotherpolynucleotide segment, e.g., chloroplast transit peptide and EPSPScoding sequence. 12. Perform a DNA sequence analysis of the artificialpolynucleotide to confirm the synthetic construction resulted in thedesired polynucleotide molecule. If errors are found, then eliminatethese by site directed mutagenesis for which many methods are known tothose skilled in the art of DNA mutagenesis.

A Zea mays codon usage (Zea mays, Table 3) version of the glyphosateresistant rice EPSPS enzyme sequence (Oryzae sativa EPSPS enzyme withTIPS mutations, SEQ ID NO:1) is made. The polynucleotide that encodesthis enzyme includes codons that encode for substituted amino acids thatdo not naturally occur in the native rice EPSPS enzyme. Thesesubstituted amino acids provide for a glyphosate resistant rice EPSPSenzyme. The steps described in Table 7 are used to construct a modifiedartificial polynucleotide sequence (OsEPSPS_ZM, SEQ ID NO:4) based on aZea mays codon usage table that encodes a modified rice EPSPS enzymeresistant to glyphosate herbicide. The comparison of the OsEPSPS_Natpolynucleotide sequence (SEQ ID NO:2) to the OsEPSPS_ZM artificialpolynucleotide sequence (SEQ ID NO:4) using the Zea mays codon usage isshown in FIG. 2.

TABLE 7 Polynucleotide construction for modified rice EPSP synthase(OsEPSPS_ZM) 1. Back translate SEQ ID NO: 1 to generate an artificialpoly- nucleotide sequence using the Zea mays codon usage table (Table3). 2. Perform sequence alignment with the native OsEPSPS poly-nucleotide sequence (SEQ ID NO: 2) and the artificial poly- nucleotidesequence to determine degree of sequence identity, map open readingframes, select patterns to search and identify restriction enzymesrecognition sequences. 3. Make corrections to the codons used in theartificial poly- nucleotide sequence to achieve desired percentage ofsequence identity and to avoid clustering of identical codons. This isespecially important for amino acids that are occur at high frequency,i.e., alanine, glycine, histidine, leucine, serine, and valine.Approximate distribution of codon usage in the polynucleotide sequenceaccording to the Zea mays codon usage, Table 3. 4. The polynucleotidesequence is inspected for local regions that have a GC:AT ratio higherthan about 2 over a range of about 50 contiguous nucleotides. Thepolynucleotide sequence is adjusted as necessary, by substituting codonsin these regions such that the local GC:AT ratio is less than about 2and the entire polynucleotide composition is in the range of 1.2-1.7. 5.Follow steps 6-12 of Table 6.

TABLE 8 Sequence percent identity between OsEPSPS polynucleotides.OsEPSPS_ZM OsEPSPS_AT OsEPSPS_Nat OsEPSPS_ZM 100.00 73.51 71.58OsEPSPS_AT 100.00 74.03 OsEPSPS_Nat 100.00

TABLE 9 The nucleotide composition and GC:AT ratio of the modifiedpolynucleotide sequences for the rice EPSPS gene sequence. A C G T GC:ATOsEPSPS_AT 377 336 444 391 1.02 OsEPSPS_ZM 365 381 470 332 1.22

The two rice EPSPS artificial polynucleotide sequences (SEQ ID NO:3 andSEQ ID NO:4) are modified such that the percent identity is below 75percent compared to SEQ ID NO:2 or relative to each other (Table 8). Thenucleotide composition and GC:AT ratio of the polynucleotide sequencesfor the rice EPSPS gene sequence are shown in Table 9. Thesepolynucleotides can be selected for use in plant expression constructstogether with different regulatory elements or they can be combined in asingle plant by retransformation with a DNA construct or by methods ofplant breeding. Concerns with gene silencing and recombination arereduced when DNA constructs have reduced levels of homologous DNA.

Example 2

Corn (Zea mays) has been genetically modified to have resistance toglyphosate herbicide (U.S. Pat. No. 6,040,497). These corn plantscontain a transgene with a corn EPSP synthase modified for glyphosatetolerance. The methods of the present invention can be used to constructa new artificial polynucleotide encoding a corn EPSP synthase that issubstantially different in percent identity to the endogenous corn EPSPsynthase gene. The newly constructed corn EPSP synthase artificialpolynucleotide can be used as a selectable marker during the selectionof transgenic plant lines that may contain additional transgenicagronomic traits. During hybrid corn seed production, it is useful tohave both parents glyphosate tolerant using non-interfering transgenes.

TABLE 10 Polynucleotide construction for modified corn EPSP synthase(ZmEPSPS_ZM, SEQ ID NO: 10) 1. Back translate SEQ ID NO: 8 to generate apolynucleotide sequence using the Zea mays codon usage table (Table 3).2. Perform sequence alignment with ZmEPSPS_Nat polynucleotide sequence(SEQ ID NO: 9) and the artificial polynucleotide sequence to determinedegree of sequence identity, map open reading frames, select patterns tosearch and identify restriction enzymes recognition sequences. 3. Makecorrections to the codons used in the artificial polynucleotide sequenceto achieve desired percentage of sequence identity and to avoidclustering of identical codons. This is especially important for aminoacids that are occur at high frequency, i.e., alanine, glycine,histidine, leucine, serine, and valine. Approximate distribution ofcodon usage in the polynucleotide sequence according to the Zea mayscodon usage, Table 3. 4. The artificial polynucleotide sequence isinspected for local regions that have a GC:AT ratio higher than about 2over a range of about 50 contiguous nucleotides. The polynucleotidesequence is adjusted as necessary, by substituting codons in theseregions such that the local GC:AT ratio is less than about 2 and theentire polynucleotide composition is in the range of 1.2-1.7. 5. Followsteps 6-12 of Table 6.

TABLE 11 Sequence percent identity between ZmEPSPS polynucleotides.ZmEPSPS_ZM ZmEPSPS_Nat ZmEPSPS_ZM 100.00 74.81 ZmEPSPS_Nat 100.00

Maize EPSPS gene nucleotide sequence is also modified to reduce identitybetween synthetic and native gene and maintain overall GC:AT ratiotypical for monocots. The GC:AT ratio for the ZmEPSPS_ZM sequence is1.38. The sequence identity is reduced to about 75% between native(ZmEPSPS_Nat, SEQ ID NO:9) and synthetic (ZmEPSPS_ZM, SEQ ID NO:10).

The comparison of native polynucleotides encoding EPSPS indicate thatthe chloroplast transit peptide is the most divergent fragment of thegene. Similarity in nucleotide sequence of mature peptides is higherthan 88% for maize and rice enzymes, and some conserved regions havesequence identity as long as 50 bps. Posttranscriptional gene silencinghas been observed for sequences as small as 60 polynucleotides (Sijen etal., Plant Cell, 8:2277-2294, 1996; Mains, Plant Mol. Biol. 43:261-273,2000).

Example 3

Soybean (Glycine max) has been genetically modified to be tolerant toglyphosate by expression of a class II EPSPS isolated from Agrobacterium(Padgette et al. Crop Sci. 35:1451-1461, 1995). A soybean native EPSPSgene sequence has been identified and an artificial polynucleotidesequence designed using the method of the present invention. Theartificial polynucleotide encodes a protein sequence that is modified toproduce a glyphosate resistant EPSPS enzyme (GmEPSPS_IKS, SEQ ID NO:5)by replacing amino acids T to I, R to K and P to S within the GNAGTAMRPmotif, resulting in a modified soybean EPSPS enzyme with the motifGNAGIAMKS (SEQ ID NO:34), also referred to as IKS mutant. Expression ofa modified EPSPS enzyme in the cells of a plant by transformation with atransgene plant expression cassette, which contains a polynucleotideencoding the modified EPSPS with the motif GNAGIAMKS will conferglyphosate tolerance to the plants. Additional amino acid substitutionsfor the arginine (R) in the motif can also include asparagine (N).

TABLE 12 Polynucleotide construction for modified soybean EPSP synthasegene (GmEPSPS_GM, SEQ ID NO: 7). 1. Back translate SEQ ID NO: 5 togenerate an artificial polynucleotide sequence using the Glycine maxcodon usage table (Table 4). 2. Perform sequence alignment withGmEPSPS_Nat polynucleotide sequence (SEQ ID NO: 6) and the artificialpolynucleotide sequence to determine degree of sequence identity, mapopen reading frames, select patterns to search and identify restrictionenzymes recognition sequences. 3. Make corrections to the codons used inthe artificial polynucleotide sequence to achieve desired percentage ofsequence identity and to avoid clustering of identical codons. This isespecially important for amino acids that are occur at high frequency,i.e., alanine, glycine, histidine, leucine, serine, and valine.Approximate distribution of codon usage in the polynucleotide sequenceaccording to the Glycine max codon usage, Table 4. 4. The polynucleotidesequence is inspected for local regions that have a GC:AT ratio higherthan about 2 over a range of about 50 contiguous nucleotides. Thepolynucleotide sequence is adjusted as necessary, by substituting codonsin these regions such that the local GC:AT ratio is less than about 2and the entire polynucleotide composition is in the range of 0.9-1.3. 5.Follow steps 6-12 of Table 6.

TABLE 13 Comparison of the sequence percent identity of the modifiedGmEPSPS at polynucleotide sequence level. GmEPSPS_GM GmEPSPS_NatGmEPSPS_GM 100.00 72.43 GmEPSPS_Nat 100.00

The soybean native EPSPS gene is modified using a soybean codon table(Table 4) and the conditions of the method of the present invention. Therelative ratio of GC:AT is not changed in the modified gene, however thesequence identity between the two is reduced to 72%.

Example 4

The native aroA polynucleotide gene isolated from Agrobacterium strainCP4 (U.S. Pat. No. 5,633,435, herein incorporated by reference in itsentirety) that encodes a glyphosate resistant EPSP synthase (SEQ IDNO:15) can be modified by the method of the present invention to providea polynucleotide that has the codon usage of Arabidopsis, Zea mays, orGlycine max. For the appropriate expression of CP4EPSPS to conferglyphosate tolerance in plants, a chloroplast transit peptide isnecessarily fused to the CP4EPSPS coding sequence to target accumulationof the enzyme to the chloroplasts. The CTP2 chloroplast transit peptideis commonly used for the expression of this gene in transgenic plants(Nida et al., J. Agric, Food Chem. 44:1960-1966, 1996). The sequence ofCP4EPSPS together with CTP2 polynucleotide (SEQ ID NO:11) have beenmodified by the method of the present invention. Other chloroplasttransit peptides known in the art can be fused to the CP4EPSPS to directthe enzyme to the chloroplasts.

TABLE 14 Polynucleotide construction for aroA:CP4 EPSP synthase codingsequence (CP4EPSPS_AT, CP4EPSPS_ZM, or CP4EPSPS_GM) 1. Place CTP2transit peptide sequence (SEQ ID NO: 11) in front of CP4EPSPS (SEQ IDNO: 15) as a fusion polypeptide. Back translate the fusion polypeptideto produce an artificial polynucleotide sequence using the Arabidopsisthaliana codon usage table (Table 2), or the Zea mays codon usage table(Table 3), or the Glycine max codon usage table (Table 4). 2. Performsequence alignment with native CTP2 (SEQ ID NO: 12) and native CP4EPSPSpolynucleotide sequence(SEQ ID NO: 16) and the artificial polynucleotidesequence to determine degree of sequence identity, map open readingframes, select patterns to search and identify restriction enzymesrecognition sequences. 3. Make corrections to the codons used in theartificial poly- nucleotide sequence to achieve desired percentage ofsequence identity and to avoid clustering of identical codons. This isespecially important for amino acids that are occur at high frequency,i.e., alanine, glycine, histidine, leucine, serine, and valine.Approximate distribution of codon usage in the polynucleotide sequenceaccording to the Arabidopsis thaliana codon usage, Table 2, or the Zeamays codon usage table (Table 3) depending on the table in use. 4. Theartificial polynucleotide sequence is inspected for local regions thathave a GC:AT ratio higher than about 2 over a range of about 50contiguous nucleotides. The polynucleotide sequence is adjusted asnecessary, by substituting codons in these regions such that the localGC:AT ratio is less than about 2 and the entire polynucleotidecomposition is in the range of 0.9-1.3 is Table 2 is used and 1.2-1.7 ifTable 3 is used. 5. Follow steps 6-12 of Table 6.

TABLE 15 Comparison of the sequence percent identity of the artificialCP4EPSPS polynucleotides. CTP2CP_GM CTP2CP4_AT CTP2CP4_ZM CTP2CP4_SynCTP2CP4_NAT CTP2CP4_GM 100.00 75.66 74.12 75.15 74.37 CTP2CP4_AT 100.0076.13 74.56* 72.93 CTP2CP4_ZM 100.00 77.76* 82.58 CTP2CP4_Syn 100.0082.70 CTP2CP4_NAT 100.00 *Percent of identity relates to the CP4EPSPSand do not include transit peptide.

TABLE 16 The nucleotide composition and GC:AT ratio of the artificialpolynucleotide sequences for the CP4EPSPS gene sequence. A C G T GC:ATCTP2CP4_GM 382 375 442 397 1.05 CTP2CP4_AT 369 408 469 350 1.22CTP2CP4_ZM 312 487 577 290 1.65

The polynucleotide sequence CTP2_Nat (SEQ ID NO:12) plus CP4EPSPS_Nat(SEQ ID NO:16) designated as CTP2CP4_Nat is compared in Table 15 to theartificial polynucleotide sequences designated as CTP2CP4_AT (CTP2_AT,SEQ ID NO:13 fused to CP4EPSPS_AT, SEQ ID NO:17) and CTP2CP4_ZM(CTP2_AT, SEQ ID NO:14 fused to CP4EPSPS_ZM, SEQ ID NO:18) produced bythe method of the present invention. The polynucleotide sequence that isthe most divergent from the native sequence CTP2CP4_NAT andCTP2CP4EPSPS_Syn is CTP2CP4_AT having about 73% and 75% sequenceidentity, respectively. The CTP2CP4_ZM polynucleotide sequence comparedto CTP2CP4_Nat and CP4EPSPS_Syn has about 83% and 78% identity to thesetwo sequences, respectively.

A primary criteria for the selection of transgenes to combine in a plantis the percent identity. Table 15 can be used to select a CP4EPSPSpolynucleotide molecule for plant expression cassette construction whenit is known that the recipient plant will contain more than one CP4EPSPSpolynucleotide. The GC:AT ratio in native CP4EPSPS is about 1.7. Theartificial version with the Zea mays codon bias is produced to have avery similar GC:AT ratio. In the Arabidopsis codon version, the GC:ATratio is decreased to about 1.2.

Gene expression is also a criteria for selection of transgenes to beexpressed. Expression of a transgene can vary in different crop plants,therefore having several artificial polynucleotide coding sequenceavailable for testing in different crop plants and genotypes, varietiesor cultivars is an advantage and an aspect of the invention.

Example 5

The bar polynucleotide sequence (SEQ ID NO:20) encoding aphosphinothricin acetyl transferase protein (SEQ ID NO:19) has been usedto genetically modify plants for resistance to glufosinate herbicide.Two new bar polynucleotide sequences have been designed using the methodof the present invention. The alignment of BAR1_Nat with the two newartificial BAR1 polynucleotides is shown in FIG. 4.

TABLE 17 Polynucleotide gene construction for BAR1_AT (SEQ ID NO: 21)and BAR1_ZM (SEQ ID NO: 22) 1. Back translate SEQ ID NO: 19 to generatea polynucleotide sequence using the Arabidopsis thaliana codon usagetable (Table 2) or the Zea mays codon usage table (Table 3) 2. Performsequence alignment with native BAR1_Nat polynucleotide sequence (SEQ IDNO: 20) and the artificial polynucleotide sequence to determine degreeof sequence identity, map open reading frames, select patterns to searchand identify restriction enzymes recognition sequences. 3. Makecorrections to the codons used in the artificial polynucleotide sequenceto achieve desired percentage of sequence identity and to avoidclustering of identical codons. This is especially important for aminoacids that are occur at high frequency, i.e., alanine, glycine,histidine, leucine, serine, and valine. Approximate distribution ofcodon usage in the polynucleotide sequence according to the Arabidopsisthaliana codon usage, Table 2, or the Zea mays codon usage table (Table3) depending on the table in use. 4. The artificial polynucleotidesequence is inspected for local regions that have a GC:AT ratio higherthan about 2 over a range of about 50 contiguous nucleotides. Thepolynucleotide sequence is adjusted as necessary, by substituting codonsin these regions such that the local GC:AT ratio is less than about 2and the entire polynucleotide composition is in the range of 0.9-1.3 ifTable 2 is used and 1.2-1.7 if Table 3 is used. 5. Follow steps 6-12 ofTable 6.

The sequence identity of artificial BAR polynucleotides is the range of73-77% (Table 18). The native polynucleotide is highly GC rich. Theartificial version (BAR1_ZM) with Zea mays codon bias has reduced theGC:AT ratio to about 1.3 and artificial version (BAR1_AT) withArabidopsis codon bias the ratio is about 1.0 (Table 19).

TABLE 18 Sequence percent identity between bar genes at thepolynucleotide sequence level. BAR1_ZM BAR1_AT BAR1_Nat BAR1_ZM 100.0077.35 76.99 BAR1_AT 100.00 73.73 BAR1_Nat 100.00

TABLE 19 The nucleotide composition and GC:AT ratio of the artificialpolynucleotide sequences for the bar gene sequence. BAR_AT 139 130 144139 1.01 BAR_ZM 122 156 154 120 1.28

Example 6

This example serves to illustrate DNA constructs for the expression ofthe artificial polynucleotides of the present invention in plants. Atransgene DNA plant expression cassette comprises regulatory elementsthat control the transcription of a mRNA from the cassette. A plantexpression cassette is constructed to include a promoter that functionsin plants that is operably linked to a 5′ leader region that is operablylinked to a DNA sequence of interest operably linked to a 3′ terminationregion. These cassettes are constructed in plasmid vectors, which canthen be transferred into plants by Agrobacterium mediated transformationmethods or other methods known to those skilled in the art of planttransformation. The following plasmid vector constructs are illustratedto provide examples of plasmids containing plant expression cassettescomprising the artificial polynucleotide molecules of the presentinvention and are not limited to these examples.

The artificial polynucleotide molecules of the present invention, forexample, CP4EPSPS_AT and CP4EPSPS_ZM are synthesized using overlappingprimers. The full length product is then amplified with gene specificprimers containing overhangs with SphI (forward primer) and EcoRI(reverse primer). Genes are cloned into the vector pCRII-TOPO(Invitrogen, CA). The resulting plasmids pMON54949 (CP4EPSPS_AT, FIG. 6)and pMON54950 (CP4EPSPS_ZM, FIG. 7) contain the artificialpolynucleotide and these polynucleotides are sequenced using DNAsequencing methods to confirm that the modifications designed by themethod of the present invention are contained in the artificialpolynucleotides. In the next step, the artificial polynucleotideencoding the CTP2 chloroplast transit peptide is ligated to the 5′ endof the CP4EPSPS polynucleotides. The CaMV 35S promoter with a duplicatedenhancer (P-CaMVe35S) and a rice actin 1 intron (I-OsAct1) derived frompMON30151 (FIG. 8) by digestion with SphI and HindIII ligated to theCTP2CP4EPSPS polynucleotides to create plasmids pMON59302(CTP2CP4EPSPS_AT, FIG. 9) and pMON59307 (CTP2CP4EPSPS_ZM, FIG. 10).

For the expression of the new artificial polynucleotides in monocotplants, genes are placed in plant expression cassettes containing at the5′ end of the polynucleotide, a promoter and an intron, a 5′untranslated region, and at the 3′ end of the polynucleotide atranscription termination signal. For this purpose, pMON42411 (FIG. 11)containing P-CaMV35S:en, I-HSP70, CTP2CP4_Nat and NOS 3′ are digestedwith NcoI and EcoRI restriction enzymes. The pMON59302 (FIG. 9) andpMON59307 (FIG. 10) are digested with same restriction enzymes.Fragments are gel purified using Qiagen gel purification kit and ligatedto form pMON58400 (CP4EPSPS_AT, FIG. 12) and pMON58401 (CP4EPSPS_ZM,FIG. 13). Additional vector pMON54964 (FIG. 14), containingP-OsAct1/I-OsAct1 is made by replacing P-e35S/I-Hsp70 from pMON58400(FIG. 12) using HindIII/NcoI fragment from pMON25455 (FIG. 15). Tocreate a monocot expression vector containing the P-FMV promoter,pMON30152 (FIG. 16) is digested with NheI, the ends are blunted withT4DNA polymerase in the presence of 4 dNTP-s (200 μM) and NcoI. TheCPT2CP4_AT or CTP2CP4_ZM DNA fragments are isolated from pMON59302 (FIG.9) or pMON59307 is (FIG. 10), respectively by digesting with EcoRI,blunting with T4 DNA polymerase and NcoI digest. Gel purified DNAfragments are ligated and new plasmids pMON54992 (CTP2CP4_AT, FIG. 17)and pMON54985 (CTP2CP4_ZM, FIG. 18) are created. In each case thesuccessful plasmid construction is confirmed by restriction endonucleasedigestion, using among others ClaI (introduced to both artificialpolynucleotides) and Pst I (introduced to CP4EPSPS_ZM). The CP4EPSPS_Natpresent in parental vectors has both ClaI and two PstI restriction sitesin coding region in different location than in artificialpolynucleotides.

For the expression of the artificial CP4EPSPS polynucleotides in dicotplants, two parental vectors are used: pMON20999(P-FMV/CTP2CP4_Syn/3′E-9, FIG. 19) and pMON45313(P-e35S/CTP2CP4_Syn/3′E9, FIG. 20). In each plasmid, a DNA fragmentcontaining the CTP2CP4_Syn polynucleotide is replaced with CTP2CP4_AT orCTP2CP4_ZM. To create pMON59308 (P-CaMVe35S/CTP2CP4_AT, FIG. 21) orpMON59309 (P-CaMVe35S/CTP2CP4_ZM, FIG. 22), pMON45313 is digested withNcoI and EcoRI and the DNA restriction fragments derived from NcoI/EcoRIdigest of pMON59302 (CTP2CP4_AT, FIG. 9) or pMON59307 (CTP2CP4_ZM, FIG.10) are ligated, respectively. To create pMON59313(P-FMV/CTP2CP4_AT/3′E9, FIG. 23) and pMON59396 (P-FMV/CTP2CP4_ZM/3′E9,FIG. 24) parental plasmid pMON20999 is digested with NcoI and BamHI toremove CTP2CP4_Syn and the restriction fragments NcoI/BamHI derived frompMON59308 (CTP2CP4_AT, FIG. 21) or pMON59309 (CTP2CP4_ZM, FIG. 22) areligated, respectively.

Example 7

The artificial polynucleotides are tested to determine efficacy forconferring glyphosate tolerance to transgenic plants. Five differentexpression cassettes (Table 20) with the new artificial CP4EPSPSpolynucleotides are transformed into corn and the resulting transgeniccorn plants compared to the commercial standard (Roundup Ready® Corn603, Monsanto Co.). The plasmid pMON25496 (FIG. 25) contained in thecommercial standard has two copies of the CP4EPSPS_Nat polynucleotide,the expression driven by the P-CaMVe35S (P-CaMVe35S) and P-OsAct1promoters, respectively. The plasmids containing the new artificialCP4EPSPS polynucleotides contain only a single copy of thepolynucleotide to be tested. The expression of these polynucleotides aredriven by the P-CaMVe35S promoter with the heat shock protein intronI-Hsp70 or the P-FMV promoter with a rice sucrose synthase intron(I-OsSS). Plasmid pMON54964 contains rice actin 1 promoter with firstnative intron (U.S. Pat. No. 5,641,876, herein incorporated by referencein its entirety).

These plasmids are transformed into corn cells by an Agrobacteriummediated method and transgenic corn lines regenerated on glyphosateselection. Transgenic corn plants can be produced by an Agrobacteriummediated transformation method. A disarmed Agrobacterium strain C58(ABI) harboring a binary construct of the present invention is used.This is transferred into Agrobacterium by a triparental mating method(Ditta et al., Proc. Natl. Acad. Sci. 77:7347-7351). Liquid cultures ofAgrobacterium are initiated from glycerol stocks or from a freshlystreaked plate and grown overnight at 26° C.-28° C. with shaking(approximately 150 rpm) to mid-log growth phase in liquid LB medium, pH7.0 containing the appropriate antibiotics. The Agrobacterium cells areresuspended in the inoculation medium (liquid CM4C) and the density isadjusted to OD₆₆₀ of 1. Freshly isolated Type II immature HiIIxLH198 andHiII corn embryos are inoculated with Agrobacterium containing aconstruct and co-cultured several days in the dark at 23° C. The embryosare then transferred to delay media and incubated at 28° C. for severalor more days. All subsequent cultures are kept at this temperature. Theembryos are transferred to a first selection medium containingcarbenicillin 500/0.5 mM glyphosate). Two weeks later, surviving tissueare transferred to a second selection medium containing carbenicillin500/1.0 mM glyphosate). Subculture surviving callus every 2 weeks untilevents can be identified. This may take about 3 subcultures on 1.0 mMglyphosate. Once events are identified, bulk up the tissue toregenerate. The plantlets are transferred to MSOD media in culturevessel and kept for two weeks. Then the plants with roots aretransferred into soil. Those skilled in the art of corn transformationcan modify this method to provide substantially identical transgeniccorn plants containing the DNA compositions of the present invention.

About 30 transgenic corn lines for each plasmid construct are tested,and the transformation efficiency and expression levels of the CP4EPSPSenzyme are shown in Table 20. The transgenic corn lines are treated withglyphosate at a rate of 64 oz/acre as young plants, the surviving plantsare assayed by CP4EPSPS ELISA (Padgette et al. Crop Sci. 35:1451-1461,1995) to determine the CP4 EPSPS protein expression levels (CP4 exp %)shown in Table 20, and the level of expression is compared to thecommercially available standard glyphosate tolerant corn plant (RoundupReady® corn 603, Monsanto Co., St. Louis, Mo.) as a percent of theamount of protein expression determined in the commercial standard.Generally, more than 50% of corn lines survive the spray with 64 oz/acreglyphosate. The surviving plants are shown to have high level ofCP4EPSPS expression that ranges from about 75 to 86% of commercialstandard 603.

TABLE 20 Transformation efficiency (TE), CP4 expression (average %)derived from transformation of different CP4-alt constructs. pMONPromoter/Intron # TE (%) CP4 exp (%)* 58400 (CP4_AT) P-CaMVe35S/IHsp705.4 75.5 58401 (CP4_ZM) P-CaMVe35S/IHsp70 7.2 84.7 54964 (CP4_AT)P-OsAct1 8.2 78.1 54985 (CP4_ZM) P-FMV/IOsSS 11.5 85.7 54992 (CP4_AT)P-FMV/IOsSS 11.5 78.2 nk603 (control) P-OsAct1/P-e35S: — 100 *CP4EPSPSexpression is calculated as percent of control (603) done on plants thatsurvived glyphosate spray (64 oz/acre).

Example 8

Three plasmid constructs are evaluated in transgenic cotton plants(Table 21). The control construct (pMON20999) containsP-FMV/CP4EPSPS_Syn this expression cassette is contained in thecommercially available glyphosate tolerant cotton line 1445 (RoundupReady® cotton, Monsanto Co., St. Louis, Mo.). The plasmid constructs,pMON59313 and pMON59396 containing the CP4EPSPS_AT and CP4EPSPS_ZMpolynucleotides, respectively, are assayed for transformation efficiencyand CP4EPSPS enzyme levels relative to the commercial glyphosatetolerant expression cassette. About fifty transgenic cotton lines areevaluated for each construct. The artificial CP4EPSPS_AT polynucleotidedriven by the P-FMV promoter gives a higher percentage of plants with asingle insert, and an increase in expression level of the CP4EPSPSenzyme relative to the pMON20999 expression cassette as measured byELISA.

TABLE 21 Transformation efficiency (TE), average CP4EPSPS expression inR0 cotton lines derived from transformation of different CP4EPSPSconstructs. pMON Promoter TE (%) CP4 Exp (%) 20999 (CP4EPSPS_Syn) P-FMV15.0 100.0 59313 (CP4EPSPS_AT) P-FMV 15.0 116.4 59396 (CP4EPSPS_ZM)P-FMV 16.1 52.0

Example 9

Constructs containing the artificial CP4EPSPS polynucleotides,CP4EPSPS_AT and CP4EPSPS_ZM are evaluated in soybean (Table 22). Theplasmid constructs all contain the P-FMV promoter to drive expression ofthe new CP4EPSPS polynucleotides and are compared to theP-FMV/CP4EPSPS_Syn contained in pMON20999. About 25 to 30 transgenicsoybean plants are produced for each construct. The transformationefficiency and CP4EPSPS enzyme, levels are measured. A surprizingly highexpression level of CP4EPSPS protein is measured in soybean plantscontaining the CP4EPSPS_ZM coding sequence (Table 22).

TABLE 22 Transformation efficiency (TE), average CP4 expression derivedfrom transformation of different CP4EPSPS constructs. pMON Promoter TE(%) CP4Exp (%) 20999 (CP4_Syn) P-FMV 0.55 100.0 59313 (CP4_AT) P-FMV0.40 66.6 54996 (CP4_ZM) P-FMV 0.29 242.5

Example 10

Tobacco cells are transformed with three plasmid constructs containingdifferent CP4EPSPS polynucleotide sequences and regenerated into plants.About twenty transgenic lines are evaluated from each construct.Expression from each of the CP4EPSPS polynucleotides is driven by theP-CaMVe35S duplicated enhancer promoter (Table 23). The transformationefficiency and CP4EPSPS enzyme expression level is measured. Thedifferent CP4EPSPS polynucleotide constructs are shown to perform aboutthe same in transgenic tobacco for transformation efficiency andexpression.

TABLE 23 Transformation efficiency (TE), average CP4 expression in R0tobacco lines derived from transformation of different CP4 EPSPSconstructs. pMON Promoter TE (%) CP4 exp. (%) 59308 CP4EPSPS_ATP-CaMVe35S 35 100.0 59309 CP4EPSPS_ZM P-CaMVe35S 35 91.0 54313CP4EPSPS_Syn P-CaMVe35S 35 100.0

Example 11

Arabidopsis thaliana is transformed with four plasmid constructs byvacuum infiltration (Bechtold N, et al., CR Acad Sci Paris Sciences dila vie/life sciences 316: 1194-1199, (1993) and V1 progeny evaluated tocompare efficacy of the different CP4EPSPS polynucleotide sequences anddifferent promoters for the use in selection of plants on glyphosate(Table 24). About 30 transgenic V1 plants (+) are produced for eachconstruct. The constructs driven by P-CaMVe35S with the duplicatedenhancer (pMON45313, pMON59308, and pMON59309) show no substantialdifference in the level of expression in leaves as determined by ELISA.The plants are transformed with pMON26140 that contains CP4EPSPS_Syndriven by the P-FMV promoter, these plants show the highest expressionlevel, the expression levels detected from the plants of the testconstructs are compared to pMON26140.

TABLE 24 Evaluation of different CP4 expression cassettes in ArabidopsisPlants pMON Promoter/ produced CP4 exp. (%) 45313(CPEPSPS4_Syn)P-CaMVe35S + 82.1 59308(CP4EPSPS_AT) P-CaMVe35S + 79.359309(CP4EPSPS_ZM) P-CaMVe35S + 77.3 26140(CP4EPSPS_Syn) P-FMV + 100.0

Example 12

Wheat plants transformed with the new CP4EPSPS polynucleotides arecompared for transformation efficiency and CP4EPSPS enzyme expressiondetermined by ELISA (Table 25). The CP4EPSPS_ZM provides at least seventimes higher CPEPSPS enzyme expression than CP4EPSPS_AT. The averageexpression of CP4EPSPS in leaves from wheat plants containing theCP4EPSPS_ZM polynucleotide is about 64% of that found in glyphosateresistant wheat that contains a double cassette construct, pMON30139:P-e35S/I-Hsp70/CP4EPSPS_Nat and P-OsAct1/I-OsAct1/CP4EPSPS_Nat(WO/0022704).

TABLE 25 Performance of different CP4EPSPS polynucleotides in wheat pMONPromoter/Intron TE (%) CP4 Exp. (%) 58400 CP4EPSPS_AT P-e35S/I-Hsp700.25 9.2 58401 CP4EPSPS_ZM P-e35S/I-Hsp70 0.35 64.0 30139 CP4EPSPS_NatP-e35S:P-OsAct1 — 100.0

Example 13

This example serves to illustrate detection of different artificialpolynucleotides in transgenic plants, specifically CP4EPSPS_AT andCP4EPSPS_ZM. The other artificial polynucleotides, OsEPSPS_AT,OsEPSPS_ZM, GmEPSPS_GM, ZmEPSPS_ZM, CTP2_AT, CTP2_ZM, Bar1_AT andBar1_ZM can all be specifically detected in transgenic plants by methodsthat provide a DNA amplicon or by hybridization of a DNA probe to aplant sample. Those skilled in the art of DNA detection can easilydesign primer molecules from the artificial polynucleotide sequencesprovided in the present invention to enable a method that willspecifically detect the artificial polynucleotide in a plant sample. Theuse of a method or a kit that provides DNA primers or probes homologousor complementary to the artificial polynucleotides disclosed herein isan aspect of the present invention.

A DNA detection method (polymerase chain reaction, PCR) is designed todetect the artificial CP4EPSPS polynucleotides in transgenic plants. Theunique sets of DNA primers shown in Table 26 are designed to amplify aspecific CP4EPSPS polynucleotide and to provide distinctly sizedamplicons. The amplicons differ sufficiently in polynucleotide lengthamong the various CP4EPSPS polynucleotides to make easy separation ofthe amplicons by standard agarose gel electrophoresis. The presence ofmore than one of the artificial polynucleotides can be detected in aplant by using a multiplex PCR method.

TABLE 26 Sequence of primers used for detection of different CP4 genesin transgenic plants. Primer pair: Gene specificity PCR product (bps)SEQ ID NOs: 24 and 25 CP4EPSPS_AT 938 (940) SEQ ID NOs: 26 and 27CP4EPSPS_ZM 595 (600) SEQ ID NOs: 28 and 29 CP4EPSPS_Nat 712 (710) SEQID NOs: 30 and 31 CP4EPSPS_Syn 443 (440)

DNA primer pairs (Table 26) are used to produce an amplicon diagnosticfor a specified CP4EPSPS polynucleotide contained in a transgenic plant.These primer pairs include, but are not limited to SEQ ID NO:24 and SEQID NO:25 for the CP4EPSPS_AT polynucleotide; SEQ ID NO:26 and SEQ IDNO:27 for the CP4EPSPS_ZM; SEQ ID NO:28 and SEQ ID NO:29 forCP4EPSPS_Nat and SEQ ID NO:30 and SEQ ID NO:31 CP4EPSPS_Synpolynucleotide molecule. In addition to these primer pairs, any primerpair derived from SEQ ID NO:17 or SEQ ID NO:18 that when used in a DNAamplification reaction produces a DNA amplicon diagnostic for therespective CP4EPSPS polynucleotide is an aspect of the presentinvention.

The amplification conditions for this analysis is illustrated in Table27 and Table 28, however, any modification of these conditions includingthe use of fragments of the DNA molecules of the present invention orcomplements thereof as primer molecules, which produce an amplicon DNAmolecule diagnostic for the artificial polynucleotides described hereinis within the ordinary skill of the art. The DNA molecules of thepresent invention include at least SEQ ID NO:3, SEQ ID NO:4, SEQ IDNO:7, SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:17, SEQ IDNO:18, SEQ ID NO:21, SEQ ID NO:22, and SEQ ID NO:35. DNA molecules thatfunction as primer molecules in a DNA amplification method to detect thepresence of the artificial polynucleotides include, but are not limitedto SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28,SEQ ID NO:29, SEQ ID NO:30, and SEQ ID NO:31.

In a method for determining the presence of polynucleotides of thepresent invention, the analysis of plant tissue DNA extract sampleshould include a positive control known to contain the artificialpolynucleotide, and a negative DNA extract control from a plant that isnot transgenic or does not contain the artificial polynucleotide, and anegative control that contains no template in the DNA extract.

Additional DNA primer molecules of sufficient length can be selectedfrom SEQ ID NO:17 and SEQ ID NO:18 and conditions optimized for theproduction of an amplicon that may differ from the methods shown inTable 27 and Table 28, but result in an amplicon diagnostic for theartificial polynucleotides. The use of these DNA primer sequenceshomologous or complementary to SEQ ID NO:17 and SEQ ID NO:18 used withor without modifications to the methods of Table 27 and 28 are withinthe scope of the invention. The assay for the CP4EPSPS_AT andCP4EPSPS_ZM amplicon can be performed by using a Stratagene Robocycler,MJ Engine, Perkin-Elmer 9700, or Eppendorf Mastercycler Gradientthermocycler as shown in Table 28, or by methods and apparatus known tothose skilled in the art.

TABLE 27 DNA amplification procedure and reaction mixture for theconfirmation of artificial EPSPS polynucleotide CP4EPSPS_AT in cornplants. Step Reagent Amount Comments 1 Nuclease-free water add to finalvolume of 20 μl — 2 10× reaction buffer 2.0 μl 1× final (with MgCl₂)concentration of buffer, 1.5 mM final concentration of MgCl₂ 3 10 mMsolution of dATP, 0.4 μl 200 μM final dCTP, dGTP, and dTTP concentrationof each dNTP 4 primer (SEQ ID NO: 24) 0.4 μl 0.2 μM final (resuspendedin 1× TE buffer concentration or nuclease-free water to a concentrationof 10 μM) 5 primer (SEQ ID NO: 25) 0.4 μl 0.2 μM final (resuspended in1× TE buffer or concentration nuclease-free water to a concentration of10 μM) 6 control primer (SEQ ID NO: 32) 0.2 μl 0.1 μM final (resuspendedin 1× TE buffer concentration or nuclease-free water to a concentrationof 10 μM) 7 control primer (SEQ ID NO: 33) 0.2 μl 0.1 μM final(resuspended in 1× TE buffer concentration or nuclease-free water to aconcentration of 10 μM) 8 RNase, DNase free (500 ng/μl) 0.1 μl 50ng/reaction 9 REDTaq DNA polymerase 1.0 μl (recommended to switch 1unit/reaction (1 unit/μl) pipets prior to next step) 10 Extracted DNA(template): — Samples to be analyzed individual leaves 10-200 ng ofgenomic DNA pooled leaves (maximum 200 ng of genomic DNA of 50leaves/pool) Negative control 50 ng of nontransgenic plant genomic DNANegative control no template DNA Positive control 5 ng plasmid DNA

TABLE 28 Suggested PCR parameters for different thermocyclers Gently mixand, if needed (no hot top on thermocycler), add 1-2 drops of mineraloil on top of each reaction. Proceed with the PCR in a StratageneRobocycler, MJ Engine, Perkin-Elmer 9700, or Eppendorf MastercyclerGradient thermocycler using the following cycling parameters. Cycle No.Settings: Stratagene Robocycler 1 94° C. 3 minutes 38 94° C. 1 minute60° C. 1 minute 72° C. 1 minute and 30 seconds 1 72° C. 10 minutes CycleNo. Settings: MJ Engine or Perkin-Elmer 9700 1 94° C. 3 minutes 38 94°C. 10 seconds 60° C. 30 seconds 72° C. 1 minute 1 72° C. 10 minutesCycle No. Settings: Eppendorf Mastercycler Gradient 1 94° C. 3 minutes38 94° C. 15 seconds 60° C. 15 seconds 72° C. 1 minute 1 72° C. 10minutes Note: The MJ Engine or Eppendorf Mastercycler Gradientthermocycler should be run in the calculated mode. Run the Perkin-Elmer9700 thermocycler with the ramp speed set at maximum.

All of the compositions and methods disclosed and claimed herein can bemade and executed in light of the present disclosure. While thecompositions and methods of this invention have been described, it willbe apparent to those of skill in the art that variations may be appliedto the compositions and methods and in the steps or in the sequence ofsteps of the methods described herein without departing from theconcept, spirit and scope of the invention. More specifically, it willbe apparent that certain agents which are both chemically andphysiologically related may be substituted for the agents describedherein while the same or similar results would be achieved. All suchsimilar substitutes and modifications apparent to those skilled in theart are deemed to be within the spirit, scope and concept of theinvention as defined by the appended claims.

All publications and patent applications cited herein are incorporatedby reference in their entirely to the same extent as if each individualpublication or patent application is specifically and individuallyindicated to be incorporated by reference.

1. A method to avoid silencing of a transgene comprising SEQ ID NO:16operably linked to a polynucleotide encoding a chloroplast transitpeptide in a transgenic plant, comprising the steps of: (a) obtaining aDNA construct comprising an artificial polynucleotide encoding achloroplast transit peptide operably linked to SEQ ID NO:18; (b)transforming said DNA construct into a plant cell; and (c) regeneratingsaid plant cell into a fertile transgenic plant, wherein said fertiletransgenic plant comprises both said artificial polynucleotide and saidtransgene, and wherein silencing of said artificial polynucleotide andsaid transgene is avoided.
 2. An artificial polynucleotide moleculecomprising SEQ ID NO:18.
 3. A DNA construct comprising: a promotermolecule that functions in plants, operably linked to said artificialpolynucleotide molecule of claim
 2. 4. A plant cell, plant or progenythereof comprising the DNA construct of claim 3, wherein said DNAconstruct further comprises a polynucleotide encoding a chloroplasttransit peptide operably linked to said artificial polynucleotidemolecule.
 5. The plant or progeny thereof of claim 4, wherein said plantis selected from the group consisting of sugarcane, wheat, corn, rice,soybean, cotton, potato, canola, turf grass, forest trees, grainsorghum, vegetable crops, ornamental plants, forage crops, and fruitcrops.
 6. A plant cell comprising at least two polynucleotides, whereinsaid two polynucleotides encode the same polypeptide and at least one ofthe polynucleotides is SEQ ID NO:18 operably linked to a polynucleotideencoding a chloroplast transit peptide.
 7. A plant or progeny thereofcomprising said plant cell of claim
 6. 8. A plant cell, plant, orprogeny thereof comprising an artificial polynucleotide moleculecomprising SEQ ID NO:18 operably linked a polynucleotide encoding achloroplast transit peptide.
 9. A DNA detection kit comprising at leastone isolated DNA molecule, wherein said isolated DNA molecule isselected from the group consisting of SEQ ID NO:26 and SEQ ID NO:27,wherein said DNA molecule is useful as a DNA probe or DNA primer.
 10. Amethod to avoid transgene silencing in transgenic plants comprising thesteps of: (a) obtaining said plant cell of claim 6; and (b) regeneratingsaid plant cell into a fertile transgenic plant, wherein said fertiletransgenic plant comprises both said polynucleotides, and whereinsilencing of said polynucleotides is avoided.